Dataset statistics
| Number of variables | 23 |
|---|---|
| Number of observations | 45542 |
| Missing cells | 102661 |
| Missing cells (%) | 9.8% |
| Duplicate rows | 52 |
| Duplicate rows (%) | 0.1% |
| Total size in memory | 8.0 MiB |
| Average record size in memory | 184.0 B |
Variable types
| Categorical | 15 |
|---|---|
| Numeric | 7 |
| Unsupported | 1 |
| Dataset has 52 (0.1%) duplicate rows | Duplicates |
belongs_to_collection has a high cardinality: 1645 distinct values | High cardinality |
genres has a high cardinality: 4069 distinct values | High cardinality |
id has a high cardinality: 45436 distinct values | High cardinality |
original_language has a high cardinality: 92 distinct values | High cardinality |
overview has a high cardinality: 44307 distinct values | High cardinality |
poster_path has a high cardinality: 45024 distinct values | High cardinality |
production_companies has a high cardinality: 22581 distinct values | High cardinality |
production_countries has a high cardinality: 2387 distinct values | High cardinality |
release_date has a high cardinality: 17333 distinct values | High cardinality |
spoken_languages has a high cardinality: 1843 distinct values | High cardinality |
tagline has a high cardinality: 20283 distinct values | High cardinality |
title has a high cardinality: 42277 distinct values | High cardinality |
cast has a high cardinality: 42663 distinct values | High cardinality |
crew has a high cardinality: 42899 distinct values | High cardinality |
original_language is highly imbalanced (67.6%) | Imbalance |
production_countries is highly imbalanced (58.4%) | Imbalance |
spoken_languages is highly imbalanced (61.2%) | Imbalance |
status is highly imbalanced (96.9%) | Imbalance |
return has 2035 (4.5%) infinite values | Infinite |
belongs_to_collection has 41039 (90.1%) missing values | Missing |
overview has 954 (2.1%) missing values | Missing |
tagline has 25103 (55.1%) missing values | Missing |
return has 34592 (76.0%) missing values | Missing |
id is uniformly distributed | Uniform |
overview is uniformly distributed | Uniform |
poster_path is uniformly distributed | Uniform |
tagline is uniformly distributed | Uniform |
title is uniformly distributed | Uniform |
popularity is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
budget has 36627 (80.4%) zeros | Zeros |
revenue has 38114 (83.7%) zeros | Zeros |
runtime has 1559 (3.4%) zeros | Zeros |
vote_average has 3005 (6.6%) zeros | Zeros |
vote_count has 2906 (6.4%) zeros | Zeros |
return has 3522 (7.7%) zeros | Zeros |
Reproduction
| Analysis started | 2023-06-11 00:51:55.777028 |
|---|---|
| Analysis finished | 2023-06-11 00:56:10.774076 |
| Duration | 4 minutes and 15 seconds |
| Software version | pandas-profiling v3.6.6 |
| Download configuration | config.json |
belongs_to_collection
Categorical
HIGH CARDINALITY  MISSING 
| Distinct | 1645 |
|---|---|
| Distinct (%) | 36.5% |
| Missing | 41039 |
| Missing (%) | 90.1% |
| Memory size | 355.9 KiB |
| [] | 110 |
|---|---|
| ['The Bowery Boys'] | 29 |
| ['Totò Collection'] | 27 |
| ['Pokémon Collection'] | 26 |
| ['James Bond Collection'] | 26 |
| Other values (1640) |
Length
| Max length | 58 |
|---|---|
| Median length | 46 |
| Mean length | 27.123917 |
| Min length | 2 |
Characters and Unicode
| Total characters | 122139 |
|---|---|
| Distinct characters | 165 |
| Distinct categories | 12 ? |
| Distinct scripts | 7 ? |
| Distinct blocks | 8 ? |
Unique
| Unique | 374 ? |
|---|---|
| Unique (%) | 8.3% |
Sample
| 1st row | ['Toy Story Collection'] |
|---|---|
| 2nd row | ['Grumpy Old Men Collection'] |
| 3rd row | ['Father of the Bride Collection'] |
| 4th row | ['James Bond Collection'] |
| 5th row | ['Balto Collection'] |
Common Values
| Value | Count | Frequency (%) |
| [] | 110 | 0.2% |
| ['The Bowery Boys'] | 29 | 0.1% |
| ['Totò Collection'] | 27 | 0.1% |
| ['Pokémon Collection'] | 26 | 0.1% |
| ['James Bond Collection'] | 26 | 0.1% |
| ['Zatôichi: The Blind Swordsman'] | 26 | 0.1% |
| ['The Carry On Collection'] | 25 | 0.1% |
| ['Charlie Chan (Sidney Toler) Collection'] | 21 | < 0.1% |
| ['Godzilla (Showa) Collection'] | 16 | < 0.1% |
| ['Charlie Chan (Warner Oland) Collection'] | 15 | < 0.1% |
| Other values (1635) | 4182 | 9.2% |
| (Missing) | 41039 |
Length
| Value | Count | Frequency (%) |
| collection | 3659 | |
| the | 1139 | 7.8% |
| 243 | 1.7% | |
| of | 229 | 1.6% |
| series | 146 | 1.0% |
| and | 84 | 0.6% |
| trilogy | 82 | 0.6% |
| a | 60 | 0.4% |
| man | 60 | 0.4% |
| in | 56 | 0.4% |
| Other values (2316) | 8784 |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 10837 | 8.9% |
| e | 10233 | 8.4% |
| 10040 | 8.2% | |
| l | 9935 | 8.1% |
| ' | 8786 | 7.2% |
| i | 7360 | 6.0% |
| n | 7222 | 5.9% |
| t | 6333 | 5.2% |
| c | 4743 | 3.9% |
| [ | 4508 | 3.7% |
| Other values (155) | 42142 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 79098 | |
| Uppercase Letter | 13562 | 11.1% |
| Space Separator | 10040 | 8.2% |
| Other Punctuation | 9245 | 7.6% |
| Open Punctuation | 4836 | 4.0% |
| Close Punctuation | 4836 | 4.0% |
| Decimal Number | 321 | 0.3% |
| Dash Punctuation | 150 | 0.1% |
| Other Letter | 37 | < 0.1% |
| Final Punctuation | 9 | < 0.1% |
| Other values (2) | 5 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 10837 | |
| e | 10233 | |
| l | 9935 | |
| i | 7360 | |
| n | 7222 | |
| t | 6333 | |
| c | 4743 | |
| a | 4333 | 5.5% |
| r | 3799 | 4.8% |
| s | 2454 | 3.1% |
| Other values (68) | 11849 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 4366 | |
| T | 1502 | 11.1% |
| S | 1053 | 7.8% |
| B | 667 | 4.9% |
| M | 609 | 4.5% |
| D | 501 | 3.7% |
| A | 490 | 3.6% |
| H | 447 | 3.3% |
| P | 419 | 3.1% |
| G | 412 | 3.0% |
| Other values (33) | 3096 |
Other Letter
| Value | Count | Frequency (%) |
| シ | 3 | 8.1% |
| 男 | 3 | 8.1% |
| は | 3 | 8.1% |
| つ | 3 | 8.1% |
| よ | 3 | 8.1% |
| ら | 3 | 8.1% |
| い | 3 | 8.1% |
| リ | 3 | 8.1% |
| ズ | 3 | 8.1% |
| 시 | 2 | 5.4% |
| Other values (4) | 8 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 8786 | |
| . | 168 | 1.8% |
| : | 99 | 1.1% |
| , | 76 | 0.8% |
| & | 50 | 0.5% |
| ! | 34 | 0.4% |
| / | 21 | 0.2% |
| * | 4 | < 0.1% |
| ? | 4 | < 0.1% |
| … | 3 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 80 | |
| 9 | 64 | |
| 3 | 54 | |
| 0 | 51 | |
| 2 | 21 | 6.5% |
| 8 | 13 | 4.0% |
| 5 | 12 | 3.7% |
| 7 | 11 | 3.4% |
| 6 | 10 | 3.1% |
| 4 | 5 | 1.6% |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 4508 | |
| ( | 328 | 6.8% |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 4508 | |
| ) | 328 | 6.8% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 148 | |
| – | 2 | 1.3% |
Space Separator
| Value | Count | Frequency (%) |
| 10040 |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 9 |
Modifier Letter
| Value | Count | Frequency (%) |
| ー | 3 |
Other Number
| Value | Count | Frequency (%) |
| ½ | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 92246 | |
| Common | 29442 | 24.1% |
| Cyrillic | 414 | 0.3% |
| Hiragana | 15 | < 0.1% |
| Hangul | 10 | < 0.1% |
| Katakana | 9 | < 0.1% |
| Han | 3 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| o | 10837 | |
| e | 10233 | |
| l | 9935 | |
| i | 7360 | 8.0% |
| n | 7222 | 7.8% |
| t | 6333 | 6.9% |
| c | 4743 | 5.1% |
| C | 4366 | 4.7% |
| a | 4333 | 4.7% |
| r | 3799 | 4.1% |
| Other values (69) | 23085 |
Cyrillic
| Value | Count | Frequency (%) |
| л | 48 | 11.6% |
| и | 41 | 9.9% |
| о | 37 | 8.9% |
| к | 30 | 7.2% |
| е | 27 | 6.5% |
| я | 25 | 6.0% |
| а | 17 | 4.1% |
| К | 16 | 3.9% |
| ц | 16 | 3.9% |
| р | 14 | 3.4% |
| Other values (32) | 143 |
Common
| Value | Count | Frequency (%) |
| 10040 | ||
| ' | 8786 | |
| [ | 4508 | |
| ] | 4508 | |
| ) | 328 | 1.1% |
| ( | 328 | 1.1% |
| . | 168 | 0.6% |
| - | 148 | 0.5% |
| : | 99 | 0.3% |
| 1 | 80 | 0.3% |
| Other values (20) | 449 | 1.5% |
Hiragana
| Value | Count | Frequency (%) |
| は | 3 | |
| つ | 3 | |
| よ | 3 | |
| ら | 3 | |
| い | 3 |
Hangul
| Value | Count | Frequency (%) |
| 시 | 2 | |
| 리 | 2 | |
| 즈 | 2 | |
| 객 | 2 | |
| 식 | 2 |
Katakana
| Value | Count | Frequency (%) |
| シ | 3 | |
| リ | 3 | |
| ズ | 3 |
Han
| Value | Count | Frequency (%) |
| 男 | 3 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 121425 | |
| Cyrillic | 414 | 0.3% |
| None | 246 | 0.2% |
| Hiragana | 15 | < 0.1% |
| Punctuation | 14 | < 0.1% |
| Katakana | 12 | < 0.1% |
| Hangul | 10 | < 0.1% |
| CJK | 3 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| o | 10837 | 8.9% |
| e | 10233 | 8.4% |
| 10040 | 8.3% | |
| l | 9935 | 8.2% |
| ' | 8786 | 7.2% |
| i | 7360 | 6.1% |
| n | 7222 | 5.9% |
| t | 6333 | 5.2% |
| c | 4743 | 3.9% |
| [ | 4508 | 3.7% |
| Other values (67) | 41428 |
None
| Value | Count | Frequency (%) |
| é | 49 | |
| ä | 38 | |
| ô | 35 | |
| ò | 28 | |
| ö | 19 | 7.7% |
| ó | 14 | 5.7% |
| ı | 14 | 5.7% |
| í | 9 | 3.7% |
| İ | 4 | 1.6% |
| á | 4 | 1.6% |
| Other values (18) | 32 |
Cyrillic
| Value | Count | Frequency (%) |
| л | 48 | 11.6% |
| и | 41 | 9.9% |
| о | 37 | 8.9% |
| к | 30 | 7.2% |
| е | 27 | 6.5% |
| я | 25 | 6.0% |
| а | 17 | 4.1% |
| К | 16 | 3.9% |
| ц | 16 | 3.9% |
| р | 14 | 3.4% |
| Other values (32) | 143 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 9 | |
| … | 3 | 21.4% |
| – | 2 | 14.3% |
Katakana
| Value | Count | Frequency (%) |
| シ | 3 | |
| ー | 3 | |
| リ | 3 | |
| ズ | 3 |
CJK
| Value | Count | Frequency (%) |
| 男 | 3 |
Hiragana
| Value | Count | Frequency (%) |
| は | 3 | |
| つ | 3 | |
| よ | 3 | |
| ら | 3 | |
| い | 3 |
Hangul
| Value | Count | Frequency (%) |
| 시 | 2 | |
| 리 | 2 | |
| 즈 | 2 | |
| 객 | 2 | |
| 식 | 2 |
budget
Real number (ℝ)
| Distinct | 1223 |
|---|---|
| Distinct (%) | 2.7% |
| Missing | 3 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4223191.5 |
| Minimum | 0 |
|---|---|
| Maximum | 3.8 × 108 |
| Zeros | 36627 |
| Zeros (%) | 80.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 355.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 25000000 |
| Maximum | 3.8 × 108 |
| Range | 3.8 × 108 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 17413544 |
|---|---|
| Coefficient of variation (CV) | 4.1233139 |
| Kurtosis | 66.822108 |
| Mean | 4223191.5 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.127262 |
| Sum | 1.9231992 × 1011 |
| Variance | 3.0323153 × 1014 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 36627 | |
| 5000000 | 286 | 0.6% |
| 10000000 | 261 | 0.6% |
| 20000000 | 243 | 0.5% |
| 2000000 | 242 | 0.5% |
| 15000000 | 226 | 0.5% |
| 3000000 | 223 | 0.5% |
| 25000000 | 206 | 0.5% |
| 1000000 | 197 | 0.4% |
| 30000000 | 192 | 0.4% |
| Other values (1213) | 6836 | 15.0% |
| Value | Count | Frequency (%) |
| 0 | 36627 | |
| 1 | 25 | 0.1% |
| 2 | 14 | < 0.1% |
| 3 | 9 | < 0.1% |
| 4 | 10 | < 0.1% |
| 5 | 8 | < 0.1% |
| 6 | 5 | < 0.1% |
| 7 | 4 | < 0.1% |
| 8 | 5 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 380000000 | 1 | < 0.1% |
| 300000000 | 1 | < 0.1% |
| 280000000 | 1 | < 0.1% |
| 270000000 | 1 | < 0.1% |
| 260000000 | 3 | < 0.1% |
| 258000000 | 1 | < 0.1% |
| 255000000 | 1 | < 0.1% |
| 250000000 | 10 | |
| 245000000 | 2 | < 0.1% |
| 237000000 | 1 | < 0.1% |
genres
Categorical
| Distinct | 4069 |
|---|---|
| Distinct (%) | 8.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 355.9 KiB |
| ['Drama'] | |
|---|---|
| ['Comedy'] | |
| ['Documentary'] | 2728 |
| [] | 2443 |
| ['Drama', 'Romance'] | 1303 |
| Other values (4064) |
Length
| Max length | 98 |
|---|---|
| Median length | 84 |
| Mean length | 21.601642 |
| Min length | 2 |
Characters and Unicode
| Total characters | 983782 |
|---|---|
| Distinct characters | 43 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 2364 ? |
|---|---|
| Unique (%) | 5.2% |
Sample
| 1st row | ['Animation', 'Comedy', 'Family'] |
|---|---|
| 2nd row | ['Adventure', 'Fantasy', 'Family'] |
| 3rd row | ['Romance', 'Comedy'] |
| 4th row | ['Comedy', 'Drama', 'Romance'] |
| 5th row | ['Comedy'] |
Common Values
| Value | Count | Frequency (%) |
| ['Drama'] | 5008 | 11.0% |
| ['Comedy'] | 3623 | 8.0% |
| ['Documentary'] | 2728 | 6.0% |
| [] | 2443 | 5.4% |
| ['Drama', 'Romance'] | 1303 | 2.9% |
| ['Comedy', 'Drama'] | 1140 | 2.5% |
| ['Horror'] | 974 | 2.1% |
| ['Comedy', 'Romance'] | 930 | 2.0% |
| ['Comedy', 'Drama', 'Romance'] | 593 | 1.3% |
| ['Drama', 'Comedy'] | 534 | 1.2% |
| Other values (4059) | 26266 |
Length
| Value | Count | Frequency (%) |
| drama | 20312 | |
| comedy | 13196 | |
| thriller | 7640 | 7.8% |
| romance | 6746 | 6.9% |
| action | 6607 | 6.8% |
| horror | 4679 | 4.8% |
| crime | 4314 | 4.4% |
| documentary | 3937 | 4.0% |
| adventure | 3508 | 3.6% |
| science | 3061 | 3.1% |
| Other values (37) | 23582 |
Most occurring characters
| Value | Count | Frequency (%) |
| ' | 182588 | |
| r | 69270 | 7.0% |
| a | 62006 | 6.3% |
| e | 55949 | 5.7% |
| m | 53232 | 5.4% |
| 52040 | 5.3% | |
| o | 48661 | 4.9% |
| , | 48195 | 4.9% |
| [ | 45542 | 4.6% |
| ] | 45542 | 4.6% |
| Other values (33) | 320757 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 513960 | |
| Other Punctuation | 230783 | |
| Uppercase Letter | 95915 | 9.7% |
| Space Separator | 52040 | 5.3% |
| Open Punctuation | 45542 | 4.6% |
| Close Punctuation | 45542 | 4.6% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 69270 | |
| a | 62006 | |
| e | 55949 | |
| m | 53232 | |
| o | 48661 | |
| i | 39819 | |
| n | 35814 | |
| y | 28585 | |
| c | 28080 | |
| t | 26310 | 5.1% |
| Other values (12) | 66234 |
Uppercase Letter
| Value | Count | Frequency (%) |
| D | 24249 | |
| C | 17513 | |
| A | 12062 | |
| F | 9789 | |
| T | 8413 | 8.8% |
| R | 6748 | 7.0% |
| H | 6078 | 6.3% |
| M | 4848 | 5.1% |
| S | 3065 | 3.2% |
| W | 2367 | 2.5% |
| Other values (6) | 783 | 0.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 182588 | |
| , | 48195 | 20.9% |
Space Separator
| Value | Count | Frequency (%) |
| 52040 |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 45542 |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 45542 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 609875 | |
| Common | 373907 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 69270 | |
| a | 62006 | 10.2% |
| e | 55949 | 9.2% |
| m | 53232 | 8.7% |
| o | 48661 | 8.0% |
| i | 39819 | 6.5% |
| n | 35814 | 5.9% |
| y | 28585 | 4.7% |
| c | 28080 | 4.6% |
| t | 26310 | 4.3% |
| Other values (28) | 162149 |
Common
| Value | Count | Frequency (%) |
| ' | 182588 | |
| 52040 | 13.9% | |
| , | 48195 | 12.9% |
| [ | 45542 | 12.2% |
| ] | 45542 | 12.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 983782 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| ' | 182588 | |
| r | 69270 | 7.0% |
| a | 62006 | 6.3% |
| e | 55949 | 5.7% |
| m | 53232 | 5.4% |
| 52040 | 5.3% | |
| o | 48661 | 4.9% |
| , | 48195 | 4.9% |
| [ | 45542 | 4.6% |
| ] | 45542 | 4.6% |
| Other values (33) | 320757 |
id
Categorical
HIGH CARDINALITY  UNIFORM 
| Distinct | 45436 |
|---|---|
| Distinct (%) | 99.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 355.9 KiB |
| 141971 | 9 |
|---|---|
| 14788 | 4 |
| 18440 | 4 |
| 15028 | 4 |
| 13209 | 4 |
| Other values (45431) |
Length
| Max length | 10 |
|---|---|
| Median length | 5 |
| Mean length | 5.2516359 |
| Min length | 1 |
Characters and Unicode
| Total characters | 239170 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 45393 ? |
|---|---|
| Unique (%) | 99.7% |
Sample
| 1st row | 862 |
|---|---|
| 2nd row | 8844 |
| 3rd row | 15602 |
| 4th row | 31357 |
| 5th row | 11862 |
Common Values
| Value | Count | Frequency (%) |
| 141971 | 9 | < 0.1% |
| 14788 | 4 | < 0.1% |
| 18440 | 4 | < 0.1% |
| 15028 | 4 | < 0.1% |
| 13209 | 4 | < 0.1% |
| 12600 | 4 | < 0.1% |
| 4912 | 4 | < 0.1% |
| 10991 | 4 | < 0.1% |
| 99080 | 4 | < 0.1% |
| 152795 | 4 | < 0.1% |
| Other values (45426) | 45497 |
Length
| Value | Count | Frequency (%) |
| 141971 | 9 | < 0.1% |
| 42495 | 4 | < 0.1% |
| 14788 | 4 | < 0.1% |
| 69234 | 4 | < 0.1% |
| 265189 | 4 | < 0.1% |
| 119916 | 4 | < 0.1% |
| 109962 | 4 | < 0.1% |
| 25541 | 4 | < 0.1% |
| 132641 | 4 | < 0.1% |
| 11115 | 4 | < 0.1% |
| Other values (45426) | 45497 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 33016 | |
| 2 | 28673 | |
| 3 | 26752 | |
| 4 | 24787 | |
| 5 | 22038 | |
| 6 | 21207 | |
| 7 | 20975 | |
| 8 | 20938 | |
| 9 | 20540 | |
| 0 | 20238 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 239164 | |
| Dash Punctuation | 6 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 33016 | |
| 2 | 28673 | |
| 3 | 26752 | |
| 4 | 24787 | |
| 5 | 22038 | |
| 6 | 21207 | |
| 7 | 20975 | |
| 8 | 20938 | |
| 9 | 20540 | |
| 0 | 20238 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 6 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 239170 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 33016 | |
| 2 | 28673 | |
| 3 | 26752 | |
| 4 | 24787 | |
| 5 | 22038 | |
| 6 | 21207 | |
| 7 | 20975 | |
| 8 | 20938 | |
| 9 | 20540 | |
| 0 | 20238 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 239170 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 33016 | |
| 2 | 28673 | |
| 3 | 26752 | |
| 4 | 24787 | |
| 5 | 22038 | |
| 6 | 21207 | |
| 7 | 20975 | |
| 8 | 20938 | |
| 9 | 20540 | |
| 0 | 20238 |
original_language
Categorical
HIGH CARDINALITY  IMBALANCE 
| Distinct | 92 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 11 |
| Missing (%) | < 0.1% |
| Memory size | 355.9 KiB |
| en | |
|---|---|
| fr | 2443 |
| it | 1529 |
| ja | 1356 |
| de | 1083 |
| Other values (87) |
Length
| Max length | 5 |
|---|---|
| Median length | 2 |
| Mean length | 2.0001537 |
| Min length | 2 |
Characters and Unicode
| Total characters | 91069 |
|---|---|
| Distinct characters | 33 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 20 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | en |
|---|---|
| 2nd row | en |
| 3rd row | en |
| 4th row | en |
| 5th row | en |
Common Values
| Value | Count | Frequency (%) |
| en | 32316 | |
| fr | 2443 | 5.4% |
| it | 1529 | 3.4% |
| ja | 1356 | 3.0% |
| de | 1083 | 2.4% |
| es | 994 | 2.2% |
| ru | 826 | 1.8% |
| hi | 508 | 1.1% |
| ko | 444 | 1.0% |
| zh | 409 | 0.9% |
| Other values (82) | 3623 | 8.0% |
Length
| Value | Count | Frequency (%) |
| en | 32316 | |
| fr | 2443 | 5.4% |
| it | 1529 | 3.4% |
| ja | 1356 | 3.0% |
| de | 1083 | 2.4% |
| es | 994 | 2.2% |
| ru | 826 | 1.8% |
| hi | 508 | 1.1% |
| ko | 444 | 1.0% |
| zh | 409 | 0.9% |
| Other values (82) | 3623 | 8.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 34648 | |
| n | 33025 | |
| r | 3641 | 4.0% |
| f | 2852 | 3.1% |
| i | 2397 | 2.6% |
| t | 2254 | 2.5% |
| a | 1851 | 2.0% |
| s | 1656 | 1.8% |
| j | 1357 | 1.5% |
| d | 1330 | 1.5% |
| Other values (23) | 6058 | 6.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 91056 | |
| Decimal Number | 10 | < 0.1% |
| Other Punctuation | 3 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 34648 | |
| n | 33025 | |
| r | 3641 | 4.0% |
| f | 2852 | 3.1% |
| i | 2397 | 2.6% |
| t | 2254 | 2.5% |
| a | 1851 | 2.0% |
| s | 1656 | 1.8% |
| j | 1357 | 1.5% |
| d | 1330 | 1.5% |
| Other values (16) | 6045 | 6.6% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 4 | |
| 8 | 2 | |
| 2 | 1 | 10.0% |
| 6 | 1 | 10.0% |
| 1 | 1 | 10.0% |
| 4 | 1 | 10.0% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 91056 | |
| Common | 13 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 34648 | |
| n | 33025 | |
| r | 3641 | 4.0% |
| f | 2852 | 3.1% |
| i | 2397 | 2.6% |
| t | 2254 | 2.5% |
| a | 1851 | 2.0% |
| s | 1656 | 1.8% |
| j | 1357 | 1.5% |
| d | 1330 | 1.5% |
| Other values (16) | 6045 | 6.6% |
Common
| Value | Count | Frequency (%) |
| 0 | 4 | |
| . | 3 | |
| 8 | 2 | |
| 2 | 1 | 7.7% |
| 6 | 1 | 7.7% |
| 1 | 1 | 7.7% |
| 4 | 1 | 7.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 91069 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 34648 | |
| n | 33025 | |
| r | 3641 | 4.0% |
| f | 2852 | 3.1% |
| i | 2397 | 2.6% |
| t | 2254 | 2.5% |
| a | 1851 | 2.0% |
| s | 1656 | 1.8% |
| j | 1357 | 1.5% |
| d | 1330 | 1.5% |
| Other values (23) | 6058 | 6.7% |
overview
Categorical
HIGH CARDINALITY  MISSING  UNIFORM 
| Distinct | 44307 |
|---|---|
| Distinct (%) | 99.4% |
| Missing | 954 |
| Missing (%) | 2.1% |
| Memory size | 355.9 KiB |
| No overview found. | 133 |
|---|---|
| Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia. | 9 |
| No Overview | 7 |
| 5 | |
| King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers. | 5 |
| Other values (44302) |
Length
| Max length | 1000 |
|---|---|
| Median length | 785 |
| Mean length | 323.36072 |
| Min length | 1 |
Characters and Unicode
| Total characters | 14418008 |
|---|---|
| Distinct characters | 429 |
| Distinct categories | 25 ? |
| Distinct scripts | 13 ? |
| Distinct blocks | 21 ? |
Unique
| Unique | 44233 ? |
|---|---|
| Unique (%) | 99.2% |
Sample
| 1st row | Led by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences. |
|---|---|
| 2nd row | When siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures. |
| 3rd row | A family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max. |
| 4th row | Cheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe. |
| 5th row | Just when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own. |
Common Values
| Value | Count | Frequency (%) |
| No overview found. | 133 | 0.3% |
| Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia. | 9 | < 0.1% |
| No Overview | 7 | < 0.1% |
| 5 | < 0.1% | |
| King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers. | 5 | < 0.1% |
| Prospero, the true Duke of Milan is now living on an enchanted island with his daughter Miranda, the savage Caliban and Ariel, a spirit of the air. Raising a sorm to bring his brother - the usurper of his dukedom - along with his royal entourage. to the island. Prospero contrives his revenge. | 4 | < 0.1% |
| East-Berlin, 1961, shortly after the erection of the Wall. Konrad, Sophie and three of their friends plan a daring escape to Western Germany. The attempt is successful, except for Konrad, who remains behind. From then on, and for the next 28 years, Konrad and Sophie will attempt to meet again, in spite of the Iron Curtain. Konrad, who has become a reputed Astrophysicist, tries to take advantage of scientific congresses outside Eastern Germany to arrange encounters with Sophie. But in a country where the political police, the Stasi, monitors the moves of all suspicious people (such as Konrad's sister Barbara and her husband Harald), preserving one's privacy, ideals and self-respect becomes an exhausting fight, even as the Eastern block begins its long process of disintegration. | 4 | < 0.1% |
| Since women are banned from soccer matches, Iranian females masquerade as males so they can slip into Tehran's stadium to see the game between Iran and Bahrain. The ones who are caught and arrested are taken to a holding area and guarded by soldiers. One sympathetic soldier agrees to watch the game through a peephole and recount the action to the impatient fans. | 4 | < 0.1% |
| Two literary women compete for 20 years: one writes for the critics; the other one, to get rich. | 4 | < 0.1% |
| In a hospital, ten soldiers are being treated for a mysterious sleeping sickness. In a story in which dreams can be experienced by others, and in which goddesses can sit casually with mortals, a nurse learns the reason why the patients will never be cured, and forms a telepathic bond with one of them. | 4 | < 0.1% |
| Other values (44297) | 44409 | |
| (Missing) | 954 | 2.1% |
Length
| Value | Count | Frequency (%) |
| the | 138629 | 5.6% |
| a | 99198 | 4.0% |
| and | 75560 | 3.1% |
| to | 73582 | 3.0% |
| of | 69846 | 2.8% |
| in | 48314 | 2.0% |
| is | 36601 | 1.5% |
| his | 36290 | 1.5% |
| with | 23983 | 1.0% |
| her | 21568 | 0.9% |
| Other values (97181) | 1833961 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2415028 | ||
| e | 1368592 | 9.5% |
| a | 944079 | 6.5% |
| t | 938160 | 6.5% |
| i | 854670 | 5.9% |
| o | 832905 | 5.8% |
| n | 825681 | 5.7% |
| s | 770623 | 5.3% |
| r | 747089 | 5.2% |
| h | 602913 | 4.2% |
| Other values (419) | 4118268 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 11190655 | |
| Space Separator | 2415066 | 16.8% |
| Uppercase Letter | 392491 | 2.7% |
| Other Punctuation | 313935 | 2.2% |
| Decimal Number | 42400 | 0.3% |
| Dash Punctuation | 36898 | 0.3% |
| Close Punctuation | 10127 | 0.1% |
| Open Punctuation | 10105 | 0.1% |
| Final Punctuation | 4574 | < 0.1% |
| Initial Punctuation | 888 | < 0.1% |
| Other values (15) | 869 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1368592 | |
| a | 944079 | 8.4% |
| t | 938160 | 8.4% |
| i | 854670 | 7.6% |
| o | 832905 | 7.4% |
| n | 825681 | 7.4% |
| s | 770623 | 6.9% |
| r | 747089 | 6.7% |
| h | 602913 | 5.4% |
| l | 480600 | 4.3% |
| Other values (142) | 2825343 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 42898 | 10.9% |
| T | 36105 | 9.2% |
| S | 31263 | 8.0% |
| M | 24037 | 6.1% |
| B | 23794 | 6.1% |
| C | 22904 | 5.8% |
| H | 19496 | 5.0% |
| W | 18730 | 4.8% |
| I | 16876 | 4.3% |
| D | 16361 | 4.2% |
| Other values (77) | 140027 |
Other Letter
| Value | Count | Frequency (%) |
| र | 6 | 4.8% |
| न | 6 | 4.8% |
| म | 5 | 4.0% |
| の | 4 | 3.2% |
| द | 3 | 2.4% |
| प | 3 | 2.4% |
| ద | 3 | 2.4% |
| अ | 3 | 2.4% |
| న | 2 | 1.6% |
| ल | 2 | 1.6% |
| Other values (76) | 88 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 133945 | |
| . | 125208 | |
| ' | 31228 | 9.9% |
| " | 11701 | 3.7% |
| : | 3316 | 1.1% |
| ? | 2766 | 0.9% |
| ; | 2499 | 0.8% |
| ! | 1552 | 0.5% |
| / | 769 | 0.2% |
| & | 457 | 0.1% |
| Other values (12) | 494 | 0.2% |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ి | 4 | |
| ́ | 4 | |
| ̈ | 3 | |
| ् | 3 | |
| ్ | 3 | |
| ் | 3 | |
| े | 2 | 6.1% |
| ं | 2 | 6.1% |
| ु | 2 | 6.1% |
| ా | 2 | 6.1% |
| Other values (4) | 5 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 9792 | |
| 0 | 8300 | |
| 9 | 6434 | |
| 2 | 4270 | |
| 5 | 2448 | 5.8% |
| 8 | 2386 | 5.6% |
| 3 | 2357 | 5.6% |
| 4 | 2188 | 5.2% |
| 7 | 2135 | 5.0% |
| 6 | 2090 | 4.9% |
Spacing Mark
| Value | Count | Frequency (%) |
| ा | 11 | |
| ी | 4 | 14.8% |
| ు | 3 | 11.1% |
| ो | 3 | 11.1% |
| ि | 2 | 7.4% |
| ு | 2 | 7.4% |
| ం | 1 | 3.7% |
| ி | 1 | 3.7% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 35371 | |
| – | 885 | 2.4% |
| — | 633 | 1.7% |
| ― | 5 | < 0.1% |
| ‐ | 4 | < 0.1% |
Other Symbol
| Value | Count | Frequency (%) |
| ® | 45 | |
| ™ | 14 | 21.9% |
| ° | 2 | 3.1% |
| ¦ | 2 | 3.1% |
| � | 1 | 1.6% |
Math Symbol
| Value | Count | Frequency (%) |
| ~ | 20 | |
| + | 12 | |
| = | 6 | 14.0% |
| | | 4 | 9.3% |
| − | 1 | 2.3% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 10051 | |
| [ | 51 | 0.5% |
| { | 2 | < 0.1% |
| „ | 1 | < 0.1% |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 318 | |
| £ | 10 | 3.0% |
| ₹ | 1 | 0.3% |
| € | 1 | 0.3% |
Space Separator
| Value | Count | Frequency (%) |
| 2415028 | ||
| 36 | < 0.1% | |
| 2 | < 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 10075 | |
| ] | 50 | 0.5% |
| } | 2 | < 0.1% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 3860 | |
| ” | 695 | 15.2% |
| » | 19 | 0.4% |
Initial Punctuation
| Value | Count | Frequency (%) |
| “ | 677 | |
| ‘ | 193 | 21.7% |
| « | 18 | 2.0% |
Control
| Value | Count | Frequency (%) |
| 106 | ||
| | 3 | 2.7% |
| | 1 | 0.9% |
Modifier Symbol
| Value | Count | Frequency (%) |
| ´ | 25 | |
| ` | 12 | |
| ¯ | 1 | 2.6% |
Format
| Value | Count | Frequency (%) |
| | 31 | |
| | 20 |
Other Number
| Value | Count | Frequency (%) |
| ½ | 8 | |
| ¹ | 8 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 19 |
Line Separator
| Value | Count | Frequency (%) |
| 7 |
Paragraph Separator
| Value | Count | Frequency (%) |
| 2 |
Letter Number
| Value | Count | Frequency (%) |
| Ⅱ | 2 |
Modifier Letter
| Value | Count | Frequency (%) |
| ʼ | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 11577914 | |
| Common | 2834675 | 19.7% |
| Cyrillic | 4587 | < 0.1% |
| Greek | 648 | < 0.1% |
| Devanagari | 77 | < 0.1% |
| Telugu | 30 | < 0.1% |
| Hiragana | 20 | < 0.1% |
| Tamil | 19 | < 0.1% |
| Han | 10 | < 0.1% |
| Hangul | 9 | < 0.1% |
| Other values (3) | 19 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1368592 | |
| a | 944079 | 8.2% |
| t | 938160 | 8.1% |
| i | 854670 | 7.4% |
| o | 832905 | 7.2% |
| n | 825681 | 7.1% |
| s | 770623 | 6.7% |
| r | 747089 | 6.5% |
| h | 602913 | 5.2% |
| l | 480600 | 4.2% |
| Other values (132) | 3212602 |
Common
| Value | Count | Frequency (%) |
| 2415028 | ||
| , | 133945 | 4.7% |
| . | 125208 | 4.4% |
| - | 35371 | 1.2% |
| ' | 31228 | 1.1% |
| " | 11701 | 0.4% |
| ) | 10075 | 0.4% |
| ( | 10051 | 0.4% |
| 1 | 9792 | 0.3% |
| 0 | 8300 | 0.3% |
| Other values (71) | 43976 | 1.6% |
Cyrillic
| Value | Count | Frequency (%) |
| о | 470 | 10.2% |
| е | 404 | 8.8% |
| а | 373 | 8.1% |
| н | 323 | 7.0% |
| и | 299 | 6.5% |
| т | 265 | 5.8% |
| р | 240 | 5.2% |
| с | 218 | 4.8% |
| в | 173 | 3.8% |
| л | 161 | 3.5% |
| Other values (46) | 1661 |
Greek
| Value | Count | Frequency (%) |
| α | 60 | 9.3% |
| ο | 55 | 8.5% |
| τ | 43 | 6.6% |
| ι | 36 | 5.6% |
| η | 36 | 5.6% |
| ν | 34 | 5.2% |
| ε | 31 | 4.8% |
| ρ | 31 | 4.8% |
| ς | 30 | 4.6% |
| π | 30 | 4.6% |
| Other values (33) | 262 |
Devanagari
| Value | Count | Frequency (%) |
| ा | 11 | 14.3% |
| र | 6 | 7.8% |
| न | 6 | 7.8% |
| म | 5 | 6.5% |
| ी | 4 | 5.2% |
| द | 3 | 3.9% |
| प | 3 | 3.9% |
| ् | 3 | 3.9% |
| ो | 3 | 3.9% |
| अ | 3 | 3.9% |
| Other values (21) | 30 |
Hiragana
| Value | Count | Frequency (%) |
| の | 4 | |
| と | 1 | 5.0% |
| め | 1 | 5.0% |
| ひ | 1 | 5.0% |
| さ | 1 | 5.0% |
| そ | 1 | 5.0% |
| ち | 1 | 5.0% |
| ず | 1 | 5.0% |
| か | 1 | 5.0% |
| み | 1 | 5.0% |
| Other values (7) | 7 |
Telugu
| Value | Count | Frequency (%) |
| ి | 4 | |
| ు | 3 | |
| ్ | 3 | |
| ద | 3 | |
| న | 2 | 6.7% |
| మ | 2 | 6.7% |
| ర | 2 | 6.7% |
| ా | 2 | 6.7% |
| స | 2 | 6.7% |
| జ | 1 | 3.3% |
| Other values (6) | 6 |
Tamil
| Value | Count | Frequency (%) |
| ் | 3 | |
| ம | 2 | |
| ர | 2 | |
| ப | 2 | |
| ு | 2 | |
| ச | 1 | 5.3% |
| ண | 1 | 5.3% |
| ி | 1 | 5.3% |
| ய | 1 | 5.3% |
| ஆ | 1 | 5.3% |
| Other values (3) | 3 |
Han
| Value | Count | Frequency (%) |
| 水 | 1 | |
| 者 | 1 | |
| 患 | 1 | |
| 俣 | 1 | |
| 世 | 1 | |
| 界 | 1 | |
| 見 | 1 | |
| 鬼 | 1 | |
| 難 | 1 | |
| 海 | 1 |
Hangul
| Value | Count | Frequency (%) |
| 사 | 2 | |
| 회 | 1 | |
| 식 | 1 | |
| 주 | 1 | |
| 기 | 1 | |
| 찾 | 1 | |
| 랑 | 1 | |
| 첫 | 1 |
Thai
| Value | Count | Frequency (%) |
| ่ | 2 | |
| ส | 1 | |
| ี | 1 | |
| แ | 1 | |
| พ | 1 | |
| ร | 1 | |
| ง | 1 |
Arabic
| Value | Count | Frequency (%) |
| م | 2 | |
| ہ | 1 | |
| ت | 1 |
Inherited
| Value | Count | Frequency (%) |
| ́ | 4 | |
| ̈ | 3 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 14399960 | |
| Punctuation | 7299 | 0.1% |
| None | 5951 | < 0.1% |
| Cyrillic | 4587 | < 0.1% |
| Devanagari | 77 | < 0.1% |
| Telugu | 30 | < 0.1% |
| Hiragana | 20 | < 0.1% |
| Tamil | 19 | < 0.1% |
| Letterlike Symbols | 14 | < 0.1% |
| CJK | 10 | < 0.1% |
| Other values (11) | 41 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2415028 | ||
| e | 1368592 | 9.5% |
| a | 944079 | 6.6% |
| t | 938160 | 6.5% |
| i | 854670 | 5.9% |
| o | 832905 | 5.8% |
| n | 825681 | 5.7% |
| s | 770623 | 5.4% |
| r | 747089 | 5.2% |
| h | 602913 | 4.2% |
| Other values (82) | 4100220 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 3860 | |
| – | 885 | 12.1% |
| ” | 695 | 9.5% |
| “ | 677 | 9.3% |
| — | 633 | 8.7% |
| … | 304 | 4.2% |
| ‘ | 193 | 2.6% |
| | 31 | 0.4% |
| 7 | 0.1% | |
| ― | 5 | 0.1% |
| Other values (4) | 9 | 0.1% |
None
| Value | Count | Frequency (%) |
| é | 1568 | |
| ä | 294 | 4.9% |
| á | 293 | 4.9% |
| ö | 250 | 4.2% |
| í | 244 | 4.1% |
| è | 209 | 3.5% |
| ü | 178 | 3.0% |
| ı | 165 | 2.8% |
| ó | 164 | 2.8% |
| ç | 158 | 2.7% |
| Other values (141) | 2428 |
Cyrillic
| Value | Count | Frequency (%) |
| о | 470 | 10.2% |
| е | 404 | 8.8% |
| а | 373 | 8.1% |
| н | 323 | 7.0% |
| и | 299 | 6.5% |
| т | 265 | 5.8% |
| р | 240 | 5.2% |
| с | 218 | 4.8% |
| в | 173 | 3.8% |
| л | 161 | 3.5% |
| Other values (46) | 1661 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ™ | 14 |
Devanagari
| Value | Count | Frequency (%) |
| ा | 11 | 14.3% |
| र | 6 | 7.8% |
| न | 6 | 7.8% |
| म | 5 | 6.5% |
| ी | 4 | 5.2% |
| द | 3 | 3.9% |
| प | 3 | 3.9% |
| ् | 3 | 3.9% |
| ो | 3 | 3.9% |
| अ | 3 | 3.9% |
| Other values (21) | 30 |
Telugu
| Value | Count | Frequency (%) |
| ి | 4 | |
| ు | 3 | |
| ్ | 3 | |
| ద | 3 | |
| న | 2 | 6.7% |
| మ | 2 | 6.7% |
| ర | 2 | 6.7% |
| ా | 2 | 6.7% |
| స | 2 | 6.7% |
| జ | 1 | 3.3% |
| Other values (6) | 6 |
Hiragana
| Value | Count | Frequency (%) |
| の | 4 | |
| と | 1 | 5.0% |
| め | 1 | 5.0% |
| ひ | 1 | 5.0% |
| さ | 1 | 5.0% |
| そ | 1 | 5.0% |
| ち | 1 | 5.0% |
| ず | 1 | 5.0% |
| か | 1 | 5.0% |
| み | 1 | 5.0% |
| Other values (7) | 7 |
Diacriticals
| Value | Count | Frequency (%) |
| ́ | 4 | |
| ̈ | 3 |
Alphabetic PF
| Value | Count | Frequency (%) |
| fi | 4 |
Tamil
| Value | Count | Frequency (%) |
| ் | 3 | |
| ம | 2 | |
| ர | 2 | |
| ப | 2 | |
| ு | 2 | |
| ச | 1 | 5.3% |
| ண | 1 | 5.3% |
| ி | 1 | 5.3% |
| ய | 1 | 5.3% |
| ஆ | 1 | 5.3% |
| Other values (3) | 3 |
Hangul
| Value | Count | Frequency (%) |
| 사 | 2 | |
| 회 | 1 | |
| 식 | 1 | |
| 주 | 1 | |
| 기 | 1 | |
| 찾 | 1 | |
| 랑 | 1 | |
| 첫 | 1 |
Arabic
| Value | Count | Frequency (%) |
| م | 2 | |
| ہ | 1 | |
| ت | 1 |
Thai
| Value | Count | Frequency (%) |
| ่ | 2 | |
| ส | 1 | |
| ี | 1 | |
| แ | 1 | |
| พ | 1 | |
| ร | 1 | |
| ง | 1 |
Number Forms
| Value | Count | Frequency (%) |
| Ⅱ | 2 |
Modifier Letters
| Value | Count | Frequency (%) |
| ʼ | 2 |
CJK
| Value | Count | Frequency (%) |
| 水 | 1 | |
| 者 | 1 | |
| 患 | 1 | |
| 俣 | 1 | |
| 世 | 1 | |
| 界 | 1 | |
| 見 | 1 | |
| 鬼 | 1 | |
| 難 | 1 | |
| 海 | 1 |
Math Operators
| Value | Count | Frequency (%) |
| − | 1 |
Katakana
| Value | Count | Frequency (%) |
| ・ | 1 |
Currency Symbols
| Value | Count | Frequency (%) |
| ₹ | 1 | |
| € | 1 |
Specials
| Value | Count | Frequency (%) |
| � | 1 |
popularity
Unsupported
REJECTED  UNSUPPORTED 
| Missing | 5 |
|---|---|
| Missing (%) | < 0.1% |
| Memory size | 355.9 KiB |
poster_path
Categorical
HIGH CARDINALITY  UNIFORM 
| Distinct | 45024 |
|---|---|
| Distinct (%) | 99.7% |
| Missing | 386 |
| Missing (%) | 0.8% |
| Memory size | 355.9 KiB |
| /8VSZ9coCzxOCW2wE2Qene1H1fKO.jpg | 9 |
|---|---|
| /5D7UBSEgdyONE6Lql6xS7s6OLcW.jpg | 5 |
| /5GasjPRAy5rlEyDOH7MeOyxyQGX.jpg | 4 |
| /q19Q5BRZpMXoNCA4OYodVozfjUh.jpg | 4 |
| /sGMPDg6je1zKi0TiX9b4pP6yN02.jpg | 4 |
| Other values (45019) |
Length
| Max length | 35 |
|---|---|
| Median length | 32 |
| Mean length | 31.971676 |
| Min length | 12 |
Characters and Unicode
| Total characters | 1443713 |
|---|---|
| Distinct characters | 66 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 44963 ? |
|---|---|
| Unique (%) | 99.6% |
Sample
| 1st row | /rhIRbceoE9lR4veEXuwCC2wARtG.jpg |
|---|---|
| 2nd row | /vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg |
| 3rd row | /6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg |
| 4th row | /16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg |
| 5th row | /e64sOI48hQXyru7naBFyssKFxVd.jpg |
Common Values
| Value | Count | Frequency (%) |
| /8VSZ9coCzxOCW2wE2Qene1H1fKO.jpg | 9 | < 0.1% |
| /5D7UBSEgdyONE6Lql6xS7s6OLcW.jpg | 5 | < 0.1% |
| /5GasjPRAy5rlEyDOH7MeOyxyQGX.jpg | 4 | < 0.1% |
| /q19Q5BRZpMXoNCA4OYodVozfjUh.jpg | 4 | < 0.1% |
| /sGMPDg6je1zKi0TiX9b4pP6yN02.jpg | 4 | < 0.1% |
| /z9WiHt5uQjs8L8tyBpRBKzlheF2.jpg | 4 | < 0.1% |
| /gLVRTxaLtUDkfscFKPyYrCtRnTk.jpg | 4 | < 0.1% |
| /nfkOkpudNNIjRrf0mTFVoiGzHyc.jpg | 4 | < 0.1% |
| /jn8L1QdWWX5c0NUOLjzaSXtZrbt.jpg | 4 | < 0.1% |
| /xGhDPrBz9mJN8CsIjA23jQSd3sc.jpg | 4 | < 0.1% |
| Other values (45014) | 45110 | |
| (Missing) | 386 | 0.8% |
Length
| Value | Count | Frequency (%) |
| 8vsz9coczxocw2we2qene1h1fko.jpg | 9 | < 0.1% |
| 5d7ubsegdyone6lql6xs7s6olcw.jpg | 5 | < 0.1% |
| nnkx3ahyot7p3au92dnglf4pkwa.jpg | 4 | < 0.1% |
| qenjwrvw9itr5pvp4cbkyfhvaop.jpg | 4 | < 0.1% |
| qw1oqlohizrhxzqrpkimyr0oxzn.jpg | 4 | < 0.1% |
| twcykxhusrqdlavneevjbnhf1yv.jpg | 4 | < 0.1% |
| iqd7zwhsece3cgdpclidxjgfdzl.jpg | 4 | < 0.1% |
| k0mf0iibj2pfoiku2kyraxl72d8.jpg | 4 | < 0.1% |
| 5iljs6xb5deihop8sxpsyxxwvpe.jpg | 4 | < 0.1% |
| w56oo9nrecf54snxvyue9qxzfjt.jpg | 4 | < 0.1% |
| Other values (45020) | 45116 |
Most occurring characters
| Value | Count | Frequency (%) |
| g | 65406 | 4.5% |
| p | 65258 | 4.5% |
| j | 65164 | 4.5% |
| . | 45153 | 3.1% |
| / | 45153 | 3.1% |
| v | 20463 | 1.4% |
| d | 20360 | 1.4% |
| m | 20349 | 1.4% |
| t | 20290 | 1.4% |
| q | 20276 | 1.4% |
| Other values (56) | 1055841 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 660240 | |
| Uppercase Letter | 492998 | |
| Decimal Number | 200162 | 13.9% |
| Other Punctuation | 90307 | 6.3% |
| Space Separator | 6 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| g | 65406 | 9.9% |
| p | 65258 | 9.9% |
| j | 65164 | 9.9% |
| v | 20463 | 3.1% |
| d | 20360 | 3.1% |
| m | 20349 | 3.1% |
| t | 20290 | 3.1% |
| q | 20276 | 3.1% |
| n | 20270 | 3.1% |
| l | 20258 | 3.1% |
| Other values (16) | 322146 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 19428 | 3.9% |
| R | 19224 | 3.9% |
| M | 19204 | 3.9% |
| C | 19194 | 3.9% |
| W | 19182 | 3.9% |
| V | 19176 | 3.9% |
| T | 19005 | 3.9% |
| K | 19004 | 3.9% |
| L | 19002 | 3.9% |
| D | 18970 | 3.8% |
| Other values (16) | 301609 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 20254 | |
| 8 | 20250 | |
| 3 | 20187 | |
| 9 | 20145 | |
| 5 | 20138 | |
| 2 | 20092 | |
| 6 | 20033 | |
| 4 | 20033 | |
| 7 | 19923 | |
| 0 | 19107 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 45153 | |
| / | 45153 | |
| : | 1 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 6 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1153238 | |
| Common | 290475 | 20.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| g | 65406 | 5.7% |
| p | 65258 | 5.7% |
| j | 65164 | 5.7% |
| v | 20463 | 1.8% |
| d | 20360 | 1.8% |
| m | 20349 | 1.8% |
| t | 20290 | 1.8% |
| q | 20276 | 1.8% |
| n | 20270 | 1.8% |
| l | 20258 | 1.8% |
| Other values (42) | 815144 |
Common
| Value | Count | Frequency (%) |
| . | 45153 | |
| / | 45153 | |
| 1 | 20254 | |
| 8 | 20250 | |
| 3 | 20187 | |
| 9 | 20145 | |
| 5 | 20138 | |
| 2 | 20092 | |
| 6 | 20033 | |
| 4 | 20033 | |
| Other values (4) | 39037 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1443713 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| g | 65406 | 4.5% |
| p | 65258 | 4.5% |
| j | 65164 | 4.5% |
| . | 45153 | 3.1% |
| / | 45153 | 3.1% |
| v | 20463 | 1.4% |
| d | 20360 | 1.4% |
| m | 20349 | 1.4% |
| t | 20290 | 1.4% |
| q | 20276 | 1.4% |
| Other values (56) | 1055841 |
production_companies
Categorical
| Distinct | 22581 |
|---|---|
| Distinct (%) | 49.6% |
| Missing | 3 |
| Missing (%) | < 0.1% |
| Memory size | 355.9 KiB |
| [] | |
|---|---|
| ['Metro-Goldwyn-Mayer (MGM)'] | 772 |
| ['Warner Bros.'] | 540 |
| ['Paramount Pictures'] | 507 |
| ['Twentieth Century Fox Film Corporation'] | 441 |
| Other values (22576) |
Length
| Max length | 663 |
|---|---|
| Median length | 489 |
| Mean length | 35.501109 |
| Min length | 2 |
Characters and Unicode
| Total characters | 1616685 |
|---|---|
| Distinct characters | 291 |
| Distinct categories | 15 ? |
| Distinct scripts | 6 ? |
| Distinct blocks | 6 ? |
Unique
| Unique | 20216 ? |
|---|---|
| Unique (%) | 44.4% |
Sample
| 1st row | ['Pixar Animation Studios'] |
|---|---|
| 2nd row | ['TriStar Pictures', 'Teitler Film', 'Interscope Communications'] |
| 3rd row | ['Warner Bros.', 'Lancaster Gate'] |
| 4th row | ['Twentieth Century Fox Film Corporation'] |
| 5th row | ['Sandollar Productions', 'Touchstone Pictures'] |
Common Values
| Value | Count | Frequency (%) |
| [] | 11958 | 26.3% |
| ['Metro-Goldwyn-Mayer (MGM)'] | 772 | 1.7% |
| ['Warner Bros.'] | 540 | 1.2% |
| ['Paramount Pictures'] | 507 | 1.1% |
| ['Twentieth Century Fox Film Corporation'] | 441 | 1.0% |
| ['Universal Pictures'] | 322 | 0.7% |
| ['RKO Radio Pictures'] | 247 | 0.5% |
| ['Columbia Pictures Corporation'] | 207 | 0.5% |
| ['Columbia Pictures'] | 147 | 0.3% |
| ['Mosfilm'] | 145 | 0.3% |
| Other values (22571) | 30253 |
Length
| Value | Count | Frequency (%) |
| 12801 | 6.8% | |
| films | 9400 | 5.0% |
| pictures | 9274 | 4.9% |
| productions | 9005 | 4.8% |
| film | 6673 | 3.5% |
| entertainment | 5149 | 2.7% |
| corporation | 2190 | 1.2% |
| company | 1749 | 0.9% |
| warner | 1478 | 0.8% |
| bros | 1411 | 0.7% |
| Other values (18395) | 129356 |
Most occurring characters
| Value | Count | Frequency (%) |
| 142959 | 8.8% | |
| ' | 140576 | 8.7% |
| i | 106400 | 6.6% |
| e | 93948 | 5.8% |
| n | 89480 | 5.5% |
| o | 84817 | 5.2% |
| r | 83218 | 5.1% |
| t | 83083 | 5.1% |
| a | 76920 | 4.8% |
| s | 62156 | 3.8% |
| Other values (281) | 653128 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 981535 | |
| Uppercase Letter | 197983 | 12.2% |
| Other Punctuation | 185006 | 11.4% |
| Space Separator | 142959 | 8.8% |
| Open Punctuation | 49846 | 3.1% |
| Close Punctuation | 49845 | 3.1% |
| Decimal Number | 4357 | 0.3% |
| Dash Punctuation | 4308 | 0.3% |
| Math Symbol | 666 | < 0.1% |
| Other Letter | 140 | < 0.1% |
| Other values (5) | 40 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 106400 | |
| e | 93948 | |
| n | 89480 | |
| o | 84817 | |
| r | 83218 | |
| t | 83083 | |
| a | 76920 | 7.8% |
| s | 62156 | 6.3% |
| l | 50877 | 5.2% |
| m | 44113 | 4.5% |
| Other values (102) | 206523 |
Other Letter
| Value | Count | Frequency (%) |
| 스 | 9 | 6.4% |
| 트 | 8 | 5.7% |
| 인 | 6 | 4.3% |
| 주 | 5 | 3.6% |
| 엔 | 5 | 3.6% |
| 터 | 5 | 3.6% |
| 테 | 5 | 3.6% |
| 먼 | 5 | 3.6% |
| 픽 | 4 | 2.9% |
| 디 | 3 | 2.1% |
| Other values (62) | 85 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 27812 | |
| F | 26283 | |
| C | 20428 | 10.3% |
| M | 13340 | 6.7% |
| S | 11881 | 6.0% |
| E | 9684 | 4.9% |
| A | 9426 | 4.8% |
| T | 9352 | 4.7% |
| B | 8966 | 4.5% |
| G | 7806 | 3.9% |
| Other values (52) | 53005 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 140576 | |
| , | 37114 | 20.1% |
| . | 5668 | 3.1% |
| & | 764 | 0.4% |
| / | 648 | 0.4% |
| " | 133 | 0.1% |
| ! | 36 | < 0.1% |
| \ | 24 | < 0.1% |
| % | 18 | < 0.1% |
| : | 9 | < 0.1% |
| Other values (6) | 16 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 1041 | |
| 1 | 716 | |
| 0 | 652 | |
| 3 | 558 | |
| 4 | 482 | |
| 9 | 204 | 4.7% |
| 6 | 197 | 4.5% |
| 7 | 174 | 4.0% |
| 5 | 171 | 3.9% |
| 8 | 162 | 3.7% |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 45548 | |
| ( | 4297 | 8.6% |
| ( | 1 | < 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 45548 | |
| ) | 4296 | 8.6% |
| ) | 1 | < 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 4306 | |
| – | 2 | < 0.1% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 665 | |
| | | 1 | 0.2% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 23 | |
| ㈜ | 2 | 8.0% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 3 | |
| » | 3 |
Other Number
| Value | Count | Frequency (%) |
| ½ | 1 | |
| ² | 1 |
Space Separator
| Value | Count | Frequency (%) |
| 142959 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 4 |
Initial Punctuation
| Value | Count | Frequency (%) |
| « | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1179115 | |
| Common | 437025 | 27.0% |
| Cyrillic | 373 | < 0.1% |
| Hangul | 115 | < 0.1% |
| Greek | 31 | < 0.1% |
| Han | 26 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 106400 | 9.0% |
| e | 93948 | 8.0% |
| n | 89480 | 7.6% |
| o | 84817 | 7.2% |
| r | 83218 | 7.1% |
| t | 83083 | 7.0% |
| a | 76920 | 6.5% |
| s | 62156 | 5.3% |
| l | 50877 | 4.3% |
| m | 44113 | 3.7% |
| Other values (99) | 404103 |
Hangul
| Value | Count | Frequency (%) |
| 스 | 9 | 7.8% |
| 트 | 8 | 7.0% |
| 인 | 6 | 5.2% |
| 주 | 5 | 4.3% |
| 엔 | 5 | 4.3% |
| 터 | 5 | 4.3% |
| 테 | 5 | 4.3% |
| 먼 | 5 | 4.3% |
| 픽 | 4 | 3.5% |
| 디 | 3 | 2.6% |
| Other values (43) | 60 |
Cyrillic
| Value | Count | Frequency (%) |
| и | 34 | 9.1% |
| о | 28 | 7.5% |
| а | 26 | 7.0% |
| л | 22 | 5.9% |
| н | 20 | 5.4% |
| м | 19 | 5.1% |
| т | 17 | 4.6% |
| ь | 16 | 4.3% |
| е | 16 | 4.3% |
| с | 16 | 4.3% |
| Other values (36) | 159 |
Common
| Value | Count | Frequency (%) |
| 142959 | ||
| ' | 140576 | |
| [ | 45548 | 10.4% |
| ] | 45548 | 10.4% |
| , | 37114 | 8.5% |
| . | 5668 | 1.3% |
| - | 4306 | 1.0% |
| ( | 4297 | 1.0% |
| ) | 4296 | 1.0% |
| 2 | 1041 | 0.2% |
| Other values (34) | 5672 | 1.3% |
Greek
| Value | Count | Frequency (%) |
| ν | 3 | 9.7% |
| ο | 3 | 9.7% |
| ρ | 2 | 6.5% |
| τ | 2 | 6.5% |
| ι | 2 | 6.5% |
| η | 2 | 6.5% |
| λ | 2 | 6.5% |
| Ε | 2 | 6.5% |
| Κ | 2 | 6.5% |
| γ | 1 | 3.2% |
| Other values (10) | 10 |
Han
| Value | Count | Frequency (%) |
| 北 | 2 | 7.7% |
| 京 | 2 | 7.7% |
| 影 | 2 | 7.7% |
| 有 | 2 | 7.7% |
| 限 | 2 | 7.7% |
| 公 | 2 | 7.7% |
| 司 | 2 | 7.7% |
| 乐 | 1 | 3.8% |
| 行 | 1 | 3.8% |
| 发 | 1 | 3.8% |
| Other values (9) | 9 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1610607 | |
| None | 5560 | 0.3% |
| Cyrillic | 373 | < 0.1% |
| Hangul | 113 | < 0.1% |
| CJK | 26 | < 0.1% |
| Punctuation | 6 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 142959 | 8.9% | |
| ' | 140576 | 8.7% |
| i | 106400 | 6.6% |
| e | 93948 | 5.8% |
| n | 89480 | 5.6% |
| o | 84817 | 5.3% |
| r | 83218 | 5.2% |
| t | 83083 | 5.2% |
| a | 76920 | 4.8% |
| s | 62156 | 3.9% |
| Other values (76) | 647050 |
None
| Value | Count | Frequency (%) |
| é | 3058 | |
| ó | 416 | 7.5% |
| á | 317 | 5.7% |
| í | 172 | 3.1% |
| ñ | 150 | 2.7% |
| ü | 148 | 2.7% |
| ä | 139 | 2.5% |
| ö | 134 | 2.4% |
| è | 129 | 2.3% |
| ô | 128 | 2.3% |
| Other values (75) | 769 | 13.8% |
Cyrillic
| Value | Count | Frequency (%) |
| и | 34 | 9.1% |
| о | 28 | 7.5% |
| а | 26 | 7.0% |
| л | 22 | 5.9% |
| н | 20 | 5.4% |
| м | 19 | 5.1% |
| т | 17 | 4.6% |
| ь | 16 | 4.3% |
| е | 16 | 4.3% |
| с | 16 | 4.3% |
| Other values (36) | 159 |
Hangul
| Value | Count | Frequency (%) |
| 스 | 9 | 8.0% |
| 트 | 8 | 7.1% |
| 인 | 6 | 5.3% |
| 주 | 5 | 4.4% |
| 엔 | 5 | 4.4% |
| 터 | 5 | 4.4% |
| 테 | 5 | 4.4% |
| 먼 | 5 | 4.4% |
| 픽 | 4 | 3.5% |
| 디 | 3 | 2.7% |
| Other values (42) | 58 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 3 | |
| – | 2 | |
| • | 1 | 16.7% |
CJK
| Value | Count | Frequency (%) |
| 北 | 2 | 7.7% |
| 京 | 2 | 7.7% |
| 影 | 2 | 7.7% |
| 有 | 2 | 7.7% |
| 限 | 2 | 7.7% |
| 公 | 2 | 7.7% |
| 司 | 2 | 7.7% |
| 乐 | 1 | 3.8% |
| 行 | 1 | 3.8% |
| 发 | 1 | 3.8% |
| Other values (9) | 9 |
production_countries
Categorical
HIGH CARDINALITY  IMBALANCE 
| Distinct | 2387 |
|---|---|
| Distinct (%) | 5.2% |
| Missing | 3 |
| Missing (%) | < 0.1% |
| Memory size | 355.9 KiB |
| ['United States of America'] | |
|---|---|
| [] | |
| ['United Kingdom'] | |
| ['France'] | 1657 |
| ['Japan'] | 1360 |
| Other values (2382) |
Length
| Max length | 289 |
|---|---|
| Median length | 199 |
| Mean length | 20.591515 |
| Min length | 2 |
Characters and Unicode
| Total characters | 937717 |
|---|---|
| Distinct characters | 55 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1761 ? |
|---|---|
| Unique (%) | 3.9% |
Sample
| 1st row | ['United States of America'] |
|---|---|
| 2nd row | ['United States of America'] |
| 3rd row | ['United States of America'] |
| 4th row | ['United States of America'] |
| 5th row | ['United States of America'] |
Common Values
| Value | Count | Frequency (%) |
| ['United States of America'] | 17873 | |
| [] | 6295 | 13.8% |
| ['United Kingdom'] | 2241 | 4.9% |
| ['France'] | 1657 | 3.6% |
| ['Japan'] | 1360 | 3.0% |
| ['Italy'] | 1030 | 2.3% |
| ['Canada'] | 842 | 1.8% |
| ['Germany'] | 752 | 1.7% |
| ['India'] | 735 | 1.6% |
| ['Russia'] | 735 | 1.6% |
| Other values (2377) | 12019 |
Length
| Value | Count | Frequency (%) |
| united | 25313 | |
| states | 21183 | |
| of | 21182 | |
| america | 21182 | |
| 6295 | 5.0% | |
| kingdom | 4103 | 3.3% |
| france | 3957 | 3.2% |
| germany | 2272 | 1.8% |
| italy | 2175 | 1.7% |
| canada | 1766 | 1.4% |
| Other values (173) | 15864 |
Most occurring characters
| Value | Count | Frequency (%) |
| ' | 99086 | 10.6% |
| e | 80807 | 8.6% |
| 79753 | 8.5% | |
| t | 72742 | 7.8% |
| a | 70650 | 7.5% |
| i | 58657 | 6.3% |
| n | 47625 | 5.1% |
| ] | 45539 | 4.9% |
| [ | 45539 | 4.9% |
| d | 34624 | 3.7% |
| Other values (45) | 302695 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 559726 | |
| Other Punctuation | 109385 | 11.7% |
| Uppercase Letter | 97775 | 10.4% |
| Space Separator | 79753 | 8.5% |
| Close Punctuation | 45539 | 4.9% |
| Open Punctuation | 45539 | 4.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 80807 | |
| t | 72742 | |
| a | 70650 | |
| i | 58657 | |
| n | 47625 | |
| d | 34624 | |
| r | 32569 | |
| o | 29629 | 5.3% |
| m | 28768 | 5.1% |
| c | 26417 | 4.7% |
| Other values (16) | 77238 |
Uppercase Letter
| Value | Count | Frequency (%) |
| U | 25414 | |
| S | 23880 | |
| A | 22424 | |
| K | 5232 | 5.4% |
| F | 4358 | 4.5% |
| I | 3598 | 3.7% |
| C | 2593 | 2.7% |
| G | 2485 | 2.5% |
| J | 1670 | 1.7% |
| R | 1305 | 1.3% |
| Other values (14) | 4816 | 4.9% |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 99086 | |
| , | 10299 | 9.4% |
Space Separator
| Value | Count | Frequency (%) |
| 79753 |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 45539 |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 45539 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 657501 | |
| Common | 280216 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 80807 | |
| t | 72742 | |
| a | 70650 | |
| i | 58657 | 8.9% |
| n | 47625 | 7.2% |
| d | 34624 | 5.3% |
| r | 32569 | 5.0% |
| o | 29629 | 4.5% |
| m | 28768 | 4.4% |
| c | 26417 | 4.0% |
| Other values (40) | 175013 |
Common
| Value | Count | Frequency (%) |
| ' | 99086 | |
| 79753 | ||
| ] | 45539 | |
| [ | 45539 | |
| , | 10299 | 3.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 937717 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| ' | 99086 | 10.6% |
| e | 80807 | 8.6% |
| 79753 | 8.5% | |
| t | 72742 | 7.8% |
| a | 70650 | 7.5% |
| i | 58657 | 6.3% |
| n | 47625 | 5.1% |
| ] | 45539 | 4.9% |
| [ | 45539 | 4.9% |
| d | 34624 | 3.7% |
| Other values (45) | 302695 |
release_date
Categorical
| Distinct | 17333 |
|---|---|
| Distinct (%) | 38.1% |
| Missing | 90 |
| Missing (%) | 0.2% |
| Memory size | 355.9 KiB |
| 2008-01-01 | 136 |
|---|---|
| 2009-01-01 | 121 |
| 2007-01-01 | 120 |
| 2005-01-01 | 111 |
| 2006-01-01 | 101 |
| Other values (17328) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 454520 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 8569 ? |
|---|---|
| Unique (%) | 18.9% |
Sample
| 1st row | 1995-10-30 |
|---|---|
| 2nd row | 1995-12-15 |
| 3rd row | 1995-12-22 |
| 4th row | 1995-12-22 |
| 5th row | 1995-02-10 |
Common Values
| Value | Count | Frequency (%) |
| 2008-01-01 | 136 | 0.3% |
| 2009-01-01 | 121 | 0.3% |
| 2007-01-01 | 120 | 0.3% |
| 2005-01-01 | 111 | 0.2% |
| 2006-01-01 | 101 | 0.2% |
| 2002-01-01 | 96 | 0.2% |
| 2004-01-01 | 90 | 0.2% |
| 2001-01-01 | 84 | 0.2% |
| 2003-01-01 | 76 | 0.2% |
| 1997-01-01 | 70 | 0.2% |
| Other values (17323) | 44447 | |
| (Missing) | 90 | 0.2% |
Length
| Value | Count | Frequency (%) |
| 2008-01-01 | 136 | 0.3% |
| 2009-01-01 | 121 | 0.3% |
| 2007-01-01 | 120 | 0.3% |
| 2005-01-01 | 111 | 0.2% |
| 2006-01-01 | 101 | 0.2% |
| 2002-01-01 | 96 | 0.2% |
| 2004-01-01 | 90 | 0.2% |
| 2001-01-01 | 84 | 0.2% |
| 2003-01-01 | 76 | 0.2% |
| 1997-01-01 | 70 | 0.2% |
| Other values (17323) | 44447 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 97780 | |
| - | 90904 | |
| 1 | 84168 | |
| 2 | 52924 | |
| 9 | 39824 | |
| 3 | 15474 | 3.4% |
| 8 | 15303 | 3.4% |
| 6 | 15047 | 3.3% |
| 5 | 14857 | 3.3% |
| 7 | 14310 | 3.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 363616 | |
| Dash Punctuation | 90904 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 97780 | |
| 1 | 84168 | |
| 2 | 52924 | |
| 9 | 39824 | |
| 3 | 15474 | 4.3% |
| 8 | 15303 | 4.2% |
| 6 | 15047 | 4.1% |
| 5 | 14857 | 4.1% |
| 7 | 14310 | 3.9% |
| 4 | 13929 | 3.8% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 90904 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 454520 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 97780 | |
| - | 90904 | |
| 1 | 84168 | |
| 2 | 52924 | |
| 9 | 39824 | |
| 3 | 15474 | 3.4% |
| 8 | 15303 | 3.4% |
| 6 | 15047 | 3.3% |
| 5 | 14857 | 3.3% |
| 7 | 14310 | 3.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 454520 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 97780 | |
| - | 90904 | |
| 1 | 84168 | |
| 2 | 52924 | |
| 9 | 39824 | |
| 3 | 15474 | 3.4% |
| 8 | 15303 | 3.4% |
| 6 | 15047 | 3.3% |
| 5 | 14857 | 3.3% |
| 7 | 14310 | 3.1% |
revenue
Real number (ℝ)
| Distinct | 6863 |
|---|---|
| Distinct (%) | 15.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11196880 |
| Minimum | 0 |
|---|---|
| Maximum | 2.7879651 × 109 |
| Zeros | 38114 |
| Zeros (%) | 83.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 355.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 47734422 |
| Maximum | 2.7879651 × 109 |
| Range | 2.7879651 × 109 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 64277481 |
|---|---|
| Coefficient of variation (CV) | 5.7406601 |
| Kurtosis | 237.90628 |
| Mean | 11196880 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 12.27583 |
| Sum | 5.099283 × 1011 |
| Variance | 4.1315946 × 1015 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 38114 | |
| 12000000 | 20 | < 0.1% |
| 10000000 | 19 | < 0.1% |
| 11000000 | 19 | < 0.1% |
| 2000000 | 18 | < 0.1% |
| 6000000 | 17 | < 0.1% |
| 5000000 | 14 | < 0.1% |
| 8000000 | 13 | < 0.1% |
| 500000 | 13 | < 0.1% |
| 14000000 | 12 | < 0.1% |
| Other values (6853) | 7283 | 16.0% |
| Value | Count | Frequency (%) |
| 0 | 38114 | |
| 1 | 12 | < 0.1% |
| 2 | 3 | < 0.1% |
| 3 | 9 | < 0.1% |
| 4 | 4 | < 0.1% |
| 5 | 5 | < 0.1% |
| 6 | 2 | < 0.1% |
| 7 | 4 | < 0.1% |
| 8 | 5 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 2787965087 | 1 | |
| 2068223624 | 1 | |
| 1845034188 | 1 | |
| 1519557910 | 1 | |
| 1513528810 | 1 | |
| 1506249360 | 1 | |
| 1405403694 | 1 | |
| 1342000000 | 1 | |
| 1274219009 | 1 | |
| 1262886337 | 1 |
runtime
Real number (ℝ)
| Distinct | 353 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 263 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 94.126438 |
| Minimum | 0 |
|---|---|
| Maximum | 1256 |
| Zeros | 1559 |
| Zeros (%) | 3.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 355.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 85 |
| median | 95 |
| Q3 | 107 |
| 95-th percentile | 138 |
| Maximum | 1256 |
| Range | 1256 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 38.398308 |
|---|---|
| Coefficient of variation (CV) | 0.40794392 |
| Kurtosis | 93.155665 |
| Mean | 94.126438 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 4.4608866 |
| Sum | 4261951 |
| Variance | 1474.4301 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 90 | 2559 | 5.6% |
| 0 | 1559 | 3.4% |
| 100 | 1471 | 3.2% |
| 95 | 1414 | 3.1% |
| 93 | 1219 | 2.7% |
| 96 | 1104 | 2.4% |
| 92 | 1082 | 2.4% |
| 94 | 1064 | 2.3% |
| 91 | 1058 | 2.3% |
| 88 | 1032 | 2.3% |
| Other values (343) | 31717 |
| Value | Count | Frequency (%) |
| 0 | 1559 | |
| 1 | 107 | 0.2% |
| 2 | 34 | 0.1% |
| 3 | 49 | 0.1% |
| 4 | 51 | 0.1% |
| 5 | 51 | 0.1% |
| 6 | 72 | 0.2% |
| 7 | 103 | 0.2% |
| 8 | 78 | 0.2% |
| 9 | 63 | 0.1% |
| Value | Count | Frequency (%) |
| 1256 | 1 | |
| 1140 | 2 | |
| 931 | 1 | |
| 925 | 1 | |
| 900 | 1 | |
| 877 | 1 | |
| 874 | 1 | |
| 840 | 2 | |
| 780 | 1 | |
| 720 | 1 |
spoken_languages
Categorical
HIGH CARDINALITY  IMBALANCE 
| Distinct | 1843 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 6 |
| Missing (%) | < 0.1% |
| Memory size | 355.9 KiB |
| ['English'] | |
|---|---|
| [] | |
| ['Français'] | 1859 |
| ['日本語'] | 1293 |
| ['Italiano'] | 1218 |
| Other values (1838) |
Length
| Max length | 215 |
|---|---|
| Median length | 11 |
| Mean length | 12.926366 |
| Min length | 2 |
Characters and Unicode
| Total characters | 588615 |
|---|---|
| Distinct characters | 176 |
| Distinct categories | 10 ? |
| Distinct scripts | 15 ? |
| Distinct blocks | 16 ? |
Unique
| Unique | 1293 ? |
|---|---|
| Unique (%) | 2.8% |
Sample
| 1st row | ['English'] |
|---|---|
| 2nd row | ['English', 'Français'] |
| 3rd row | ['English'] |
| 4th row | ['English'] |
| 5th row | ['English'] |
Common Values
| Value | Count | Frequency (%) |
| ['English'] | 22425 | |
| [] | 3836 | 8.4% |
| ['Français'] | 1859 | 4.1% |
| ['日本語'] | 1293 | 2.8% |
| ['Italiano'] | 1218 | 2.7% |
| ['Español'] | 902 | 2.0% |
| ['Pусский'] | 807 | 1.8% |
| ['Deutsch'] | 764 | 1.7% |
| ['English', 'Français'] | 682 | 1.5% |
| ['English', 'Español'] | 572 | 1.3% |
| Other values (1833) | 11178 |
Length
| Value | Count | Frequency (%) |
| english | 28787 | |
| 4816 | 8.2% | |
| français | 4206 | 7.2% |
| deutsch | 2628 | 4.5% |
| español | 2413 | 4.1% |
| italiano | 2369 | 4.0% |
| 日本語 | 1762 | 3.0% |
| pусский | 1563 | 2.7% |
| 普通话 | 790 | 1.3% |
| हिन्दी | 709 | 1.2% |
| Other values (69) | 8593 | 14.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| ' | 106772 | |
| [ | 45536 | 7.7% |
| ] | 45536 | 7.7% |
| s | 42367 | 7.2% |
| n | 37543 | 6.4% |
| i | 37190 | 6.3% |
| l | 34695 | 5.9% |
| h | 31521 | 5.4% |
| E | 31257 | 5.3% |
| g | 30474 | 5.2% |
| Other values (166) | 145724 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 292685 | |
| Other Punctuation | 119575 | |
| Uppercase Letter | 46519 | 7.9% |
| Open Punctuation | 45536 | 7.7% |
| Close Punctuation | 45536 | 7.7% |
| Other Letter | 22245 | 3.8% |
| Space Separator | 13100 | 2.2% |
| Spacing Mark | 1842 | 0.3% |
| Nonspacing Mark | 1551 | 0.3% |
| Decimal Number | 26 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| s | 42367 | |
| n | 37543 | |
| i | 37190 | |
| l | 34695 | |
| h | 31521 | |
| g | 30474 | |
| a | 19015 | |
| o | 7067 | 2.4% |
| r | 6144 | 2.1% |
| t | 5985 | 2.0% |
| Other values (64) | 40684 |
Other Letter
| Value | Count | Frequency (%) |
| 語 | 1762 | 7.9% |
| 日 | 1762 | 7.9% |
| 本 | 1762 | 7.9% |
| 话 | 1263 | 5.7% |
| 州 | 946 | 4.3% |
| 通 | 790 | 3.6% |
| 普 | 790 | 3.6% |
| न | 709 | 3.2% |
| ह | 709 | 3.2% |
| द | 709 | 3.2% |
| Other values (46) | 11043 |
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 31257 | |
| F | 4208 | 9.0% |
| D | 2932 | 6.3% |
| P | 2679 | 5.8% |
| I | 2369 | 5.1% |
| N | 833 | 1.8% |
| L | 507 | 1.1% |
| M | 363 | 0.8% |
| T | 308 | 0.7% |
| Č | 286 | 0.6% |
| Other values (13) | 777 | 1.7% |
Spacing Mark
| Value | Count | Frequency (%) |
| ी | 709 | |
| ि | 709 | |
| ు | 136 | 7.4% |
| ி | 111 | 6.0% |
| া | 94 | 5.1% |
| ং | 47 | 2.6% |
| ਾ | 18 | 1.0% |
| ੀ | 18 | 1.0% |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ् | 709 | |
| ִ | 430 | |
| ְ | 215 | 13.9% |
| ் | 111 | 7.2% |
| ె | 68 | 4.4% |
| ੰ | 18 | 1.2% |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 106772 | |
| , | 11686 | 9.8% |
| / | 1015 | 0.8% |
| \ | 52 | < 0.1% |
| ? | 50 | < 0.1% |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 45536 |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 45536 |
Space Separator
| Value | Count | Frequency (%) |
| 13100 |
Decimal Number
| Value | Count | Frequency (%) |
| 9 | 26 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 326809 | |
| Common | 223773 | |
| Han | 10494 | 1.8% |
| Cyrillic | 10460 | 1.8% |
| Devanagari | 4254 | 0.7% |
| Arabic | 3366 | 0.6% |
| Hangul | 3252 | 0.6% |
| Hebrew | 1720 | 0.3% |
| Greek | 1704 | 0.3% |
| Thai | 1246 | 0.2% |
| Other values (5) | 1537 | 0.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| s | 42367 | |
| n | 37543 | |
| i | 37190 | |
| l | 34695 | |
| h | 31521 | |
| E | 31257 | |
| g | 30474 | |
| a | 19015 | 5.8% |
| o | 7067 | 2.2% |
| r | 6144 | 1.9% |
| Other values (51) | 49536 |
Cyrillic
| Value | Count | Frequency (%) |
| с | 3213 | |
| к | 1735 | |
| и | 1680 | |
| й | 1616 | |
| у | 1565 | |
| а | 113 | 1.1% |
| р | 87 | 0.8% |
| н | 53 | 0.5% |
| ь | 53 | 0.5% |
| У | 53 | 0.5% |
| Other values (12) | 292 | 2.8% |
Arabic
| Value | Count | Frequency (%) |
| ا | 541 | |
| ر | 541 | |
| ب | 342 | |
| ة | 342 | |
| ي | 342 | |
| ع | 342 | |
| ل | 342 | |
| ی | 144 | 4.3% |
| ف | 144 | 4.3% |
| س | 144 | 4.3% |
| Other values (5) | 142 | 4.2% |
Han
| Value | Count | Frequency (%) |
| 語 | 1762 | |
| 日 | 1762 | |
| 本 | 1762 | |
| 话 | 1263 | |
| 州 | 946 | |
| 通 | 790 | |
| 普 | 790 | |
| 广 | 473 | 4.5% |
| 廣 | 473 | 4.5% |
| 話 | 473 | 4.5% |
Common
| Value | Count | Frequency (%) |
| ' | 106772 | |
| [ | 45536 | |
| ] | 45536 | |
| 13100 | 5.9% | |
| , | 11686 | 5.2% |
| / | 1015 | 0.5% |
| \ | 52 | < 0.1% |
| ? | 50 | < 0.1% |
| 9 | 26 | < 0.1% |
Hebrew
| Value | Count | Frequency (%) |
| ִ | 430 | |
| י | 215 | |
| ע | 215 | |
| ב | 215 | |
| ְ | 215 | |
| ר | 215 | |
| ת | 215 |
Greek
| Value | Count | Frequency (%) |
| λ | 426 | |
| κ | 213 | |
| ε | 213 | |
| η | 213 | |
| ν | 213 | |
| ι | 213 | |
| ά | 213 |
Georgian
| Value | Count | Frequency (%) |
| ი | 33 | |
| ლ | 33 | |
| უ | 33 | |
| თ | 33 | |
| რ | 33 | |
| ა | 33 | |
| ქ | 33 |
Devanagari
| Value | Count | Frequency (%) |
| ी | 709 | |
| न | 709 | |
| ह | 709 | |
| ि | 709 | |
| द | 709 | |
| ् | 709 |
Hangul
| Value | Count | Frequency (%) |
| 선 | 542 | |
| 조 | 542 | |
| 한 | 542 | |
| 국 | 542 | |
| 어 | 542 | |
| 말 | 542 |
Thai
| Value | Count | Frequency (%) |
| า | 356 | |
| ท | 178 | |
| ย | 178 | |
| ษ | 178 | |
| ไ | 178 | |
| ภ | 178 |
Gurmukhi
| Value | Count | Frequency (%) |
| ਾ | 18 | |
| ੀ | 18 | |
| ਬ | 18 | |
| ੰ | 18 | |
| ਜ | 18 | |
| ਪ | 18 |
Telugu
| Value | Count | Frequency (%) |
| ు | 136 | |
| ల | 68 | |
| ె | 68 | |
| త | 68 | |
| గ | 68 |
Tamil
| Value | Count | Frequency (%) |
| ழ | 111 | |
| ி | 111 | |
| த | 111 | |
| ் | 111 | |
| ம | 111 |
Bengali
| Value | Count | Frequency (%) |
| া | 94 | |
| ল | 47 | |
| ং | 47 | |
| ব | 47 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 541734 | |
| CJK | 10494 | 1.8% |
| Cyrillic | 10460 | 1.8% |
| None | 10426 | 1.8% |
| Devanagari | 4254 | 0.7% |
| Arabic | 3366 | 0.6% |
| Hangul | 3252 | 0.6% |
| Hebrew | 1720 | 0.3% |
| Thai | 1246 | 0.2% |
| Tamil | 555 | 0.1% |
| Other values (6) | 1108 | 0.2% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| ' | 106772 | |
| [ | 45536 | |
| ] | 45536 | |
| s | 42367 | 7.8% |
| n | 37543 | 6.9% |
| i | 37190 | 6.9% |
| l | 34695 | 6.4% |
| h | 31521 | 5.8% |
| E | 31257 | 5.8% |
| g | 30474 | 5.6% |
| Other values (44) | 98843 |
None
| Value | Count | Frequency (%) |
| ç | 4453 | |
| ñ | 2413 | |
| ê | 591 | 5.7% |
| λ | 426 | 4.1% |
| Č | 286 | 2.7% |
| ý | 286 | 2.7% |
| ü | 247 | 2.4% |
| κ | 213 | 2.0% |
| ε | 213 | 2.0% |
| η | 213 | 2.0% |
| Other values (10) | 1085 | 10.4% |
Cyrillic
| Value | Count | Frequency (%) |
| с | 3213 | |
| к | 1735 | |
| и | 1680 | |
| й | 1616 | |
| у | 1565 | |
| а | 113 | 1.1% |
| р | 87 | 0.8% |
| н | 53 | 0.5% |
| ь | 53 | 0.5% |
| У | 53 | 0.5% |
| Other values (12) | 292 | 2.8% |
CJK
| Value | Count | Frequency (%) |
| 語 | 1762 | |
| 日 | 1762 | |
| 本 | 1762 | |
| 话 | 1263 | |
| 州 | 946 | |
| 通 | 790 | |
| 普 | 790 | |
| 广 | 473 | 4.5% |
| 廣 | 473 | 4.5% |
| 話 | 473 | 4.5% |
Devanagari
| Value | Count | Frequency (%) |
| ी | 709 | |
| न | 709 | |
| ह | 709 | |
| ि | 709 | |
| द | 709 | |
| ् | 709 |
Hangul
| Value | Count | Frequency (%) |
| 선 | 542 | |
| 조 | 542 | |
| 한 | 542 | |
| 국 | 542 | |
| 어 | 542 | |
| 말 | 542 |
Arabic
| Value | Count | Frequency (%) |
| ا | 541 | |
| ر | 541 | |
| ب | 342 | |
| ة | 342 | |
| ي | 342 | |
| ع | 342 | |
| ل | 342 | |
| ی | 144 | 4.3% |
| ف | 144 | 4.3% |
| س | 144 | 4.3% |
| Other values (5) | 142 | 4.2% |
Hebrew
| Value | Count | Frequency (%) |
| ִ | 430 | |
| י | 215 | |
| ע | 215 | |
| ב | 215 | |
| ְ | 215 | |
| ר | 215 | |
| ת | 215 |
Thai
| Value | Count | Frequency (%) |
| า | 356 | |
| ท | 178 | |
| ย | 178 | |
| ษ | 178 | |
| ไ | 178 | |
| ภ | 178 |
Telugu
| Value | Count | Frequency (%) |
| ు | 136 | |
| ల | 68 | |
| ె | 68 | |
| త | 68 | |
| గ | 68 |
Tamil
| Value | Count | Frequency (%) |
| ழ | 111 | |
| ி | 111 | |
| த | 111 | |
| ் | 111 | |
| ம | 111 |
Bengali
| Value | Count | Frequency (%) |
| া | 94 | |
| ল | 47 | |
| ং | 47 | |
| ব | 47 |
Latin Ext Additional
| Value | Count | Frequency (%) |
| ế | 61 | |
| ệ | 61 |
Georgian
| Value | Count | Frequency (%) |
| ი | 33 | |
| ლ | 33 | |
| უ | 33 | |
| თ | 33 | |
| რ | 33 | |
| ა | 33 | |
| ქ | 33 |
Gurmukhi
| Value | Count | Frequency (%) |
| ਾ | 18 | |
| ੀ | 18 | |
| ਬ | 18 | |
| ੰ | 18 | |
| ਜ | 18 | |
| ਪ | 18 |
IPA Ext
| Value | Count | Frequency (%) |
| ə | 4 |
status
Categorical
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 87 |
| Missing (%) | 0.2% |
| Memory size | 355.9 KiB |
| Released | |
|---|---|
| Rumored | 232 |
| Post Production | 98 |
| In Production | 20 |
| Planned | 15 |
Length
| Max length | 15 |
|---|---|
| Median length | 8 |
| Mean length | 8.0118579 |
| Min length | 7 |
Characters and Unicode
| Total characters | 364179 |
|---|---|
| Distinct characters | 18 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Released |
|---|---|
| 2nd row | Released |
| 3rd row | Released |
| 4th row | Released |
| 5th row | Released |
Common Values
| Value | Count | Frequency (%) |
| Released | 45088 | |
| Rumored | 232 | 0.5% |
| Post Production | 98 | 0.2% |
| In Production | 20 | < 0.1% |
| Planned | 15 | < 0.1% |
| Canceled | 2 | < 0.1% |
| (Missing) | 87 | 0.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| released | 45088 | |
| rumored | 232 | 0.5% |
| production | 118 | 0.3% |
| post | 98 | 0.2% |
| in | 20 | < 0.1% |
| planned | 15 | < 0.1% |
| canceled | 2 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 135515 | |
| d | 45455 | 12.5% |
| R | 45320 | 12.4% |
| s | 45186 | 12.4% |
| l | 45105 | 12.4% |
| a | 45105 | 12.4% |
| o | 566 | 0.2% |
| r | 350 | 0.1% |
| u | 350 | 0.1% |
| m | 232 | 0.1% |
| Other values (8) | 995 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 318488 | |
| Uppercase Letter | 45573 | 12.5% |
| Space Separator | 118 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 135515 | |
| d | 45455 | 14.3% |
| s | 45186 | 14.2% |
| l | 45105 | 14.2% |
| a | 45105 | 14.2% |
| o | 566 | 0.2% |
| r | 350 | 0.1% |
| u | 350 | 0.1% |
| m | 232 | 0.1% |
| t | 216 | 0.1% |
| Other values (3) | 408 | 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| R | 45320 | |
| P | 231 | 0.5% |
| I | 20 | < 0.1% |
| C | 2 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 118 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 364061 | |
| Common | 118 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 135515 | |
| d | 45455 | 12.5% |
| R | 45320 | 12.4% |
| s | 45186 | 12.4% |
| l | 45105 | 12.4% |
| a | 45105 | 12.4% |
| o | 566 | 0.2% |
| r | 350 | 0.1% |
| u | 350 | 0.1% |
| m | 232 | 0.1% |
| Other values (7) | 877 | 0.2% |
Common
| Value | Count | Frequency (%) |
| 118 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 364179 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 135515 | |
| d | 45455 | 12.5% |
| R | 45320 | 12.4% |
| s | 45186 | 12.4% |
| l | 45105 | 12.4% |
| a | 45105 | 12.4% |
| o | 566 | 0.2% |
| r | 350 | 0.1% |
| u | 350 | 0.1% |
| m | 232 | 0.1% |
| Other values (8) | 995 | 0.3% |
tagline
Categorical
HIGH CARDINALITY  MISSING  UNIFORM 
| Distinct | 20283 |
|---|---|
| Distinct (%) | 99.2% |
| Missing | 25103 |
| Missing (%) | 55.1% |
| Memory size | 355.9 KiB |
| Which one is the first to return - memory or the murderer? | 9 |
|---|---|
| Based on a true story. | 7 |
| Pokémon: Spell of the Unknown | 4 |
| There is no solitude greater than that of the Samurai | 4 |
| A love, a hope, a wall. | 4 |
| Other values (20278) |
Length
| Max length | 297 |
|---|---|
| Median length | 204 |
| Mean length | 47.006752 |
| Min length | 1 |
Characters and Unicode
| Total characters | 960771 |
|---|---|
| Distinct characters | 170 |
| Distinct categories | 17 ? |
| Distinct scripts | 6 ? |
| Distinct blocks | 10 ? |
Unique
| Unique | 20174 ? |
|---|---|
| Unique (%) | 98.7% |
Sample
| 1st row | Roll the dice and unleash the excitement! |
|---|---|
| 2nd row | Still Yelling. Still Fighting. Still Ready for Love. |
| 3rd row | Friends are the people who let you be yourself... and never let you forget it. |
| 4th row | Just When His World Is Back To Normal... He's In For The Surprise Of His Life! |
| 5th row | A Los Angeles Crime Saga |
Common Values
| Value | Count | Frequency (%) |
| Which one is the first to return - memory or the murderer? | 9 | < 0.1% |
| Based on a true story. | 7 | < 0.1% |
| Pokémon: Spell of the Unknown | 4 | < 0.1% |
| There is no solitude greater than that of the Samurai | 4 | < 0.1% |
| A love, a hope, a wall. | 4 | < 0.1% |
| Trust no one. | 4 | < 0.1% |
| Every woman who has loved will understand | 4 | < 0.1% |
| Some things are better left top secret. | 4 | < 0.1% |
| From the very beginning, they knew they'd be friends to the end. What they didn't count on was everything in between. | 4 | < 0.1% |
| - | 4 | < 0.1% |
| Other values (20273) | 20391 | |
| (Missing) | 25103 |
Length
| Value | Count | Frequency (%) |
| the | 11031 | 6.3% |
| a | 6831 | 3.9% |
| of | 4412 | 2.5% |
| to | 3594 | 2.1% |
| is | 2808 | 1.6% |
| in | 2698 | 1.5% |
| and | 2688 | 1.5% |
| you | 2392 | 1.4% |
| 1591 | 0.9% | |
| for | 1525 | 0.9% |
| Other values (15108) | 134744 |
Most occurring characters
| Value | Count | Frequency (%) |
| 154023 | ||
| e | 94648 | 9.9% |
| t | 57409 | 6.0% |
| o | 56689 | 5.9% |
| a | 51572 | 5.4% |
| n | 47624 | 5.0% |
| i | 46141 | 4.8% |
| r | 45119 | 4.7% |
| s | 42435 | 4.4% |
| h | 37258 | 3.9% |
| Other values (160) | 327853 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 682053 | |
| Space Separator | 154023 | 16.0% |
| Uppercase Letter | 75091 | 7.8% |
| Other Punctuation | 44643 | 4.6% |
| Decimal Number | 2687 | 0.3% |
| Dash Punctuation | 1954 | 0.2% |
| Final Punctuation | 98 | < 0.1% |
| Open Punctuation | 56 | < 0.1% |
| Close Punctuation | 55 | < 0.1% |
| Currency Symbol | 37 | < 0.1% |
| Other values (7) | 74 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 94648 | |
| t | 57409 | 8.4% |
| o | 56689 | 8.3% |
| a | 51572 | 7.6% |
| n | 47624 | 7.0% |
| i | 46141 | 6.8% |
| r | 45119 | 6.6% |
| s | 42435 | 6.2% |
| h | 37258 | 5.5% |
| l | 30231 | 4.4% |
| Other values (43) | 172927 |
Other Letter
| Value | Count | Frequency (%) |
| 劇 | 1 | 2.9% |
| ஆ | 1 | 2.9% |
| 時 | 1 | 2.9% |
| 熟 | 1 | 2.9% |
| த | 1 | 2.9% |
| வ | 1 | 2.9% |
| ன | 1 | 2.9% |
| 后 | 1 | 2.9% |
| 場 | 1 | 2.9% |
| 版 | 1 | 2.9% |
| Other values (24) | 24 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 10017 | 13.3% |
| A | 6885 | 9.2% |
| S | 5663 | 7.5% |
| H | 4407 | 5.9% |
| I | 4388 | 5.8% |
| E | 4311 | 5.7% |
| W | 3691 | 4.9% |
| O | 3482 | 4.6% |
| N | 3201 | 4.3% |
| L | 3198 | 4.3% |
| Other values (20) | 25848 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 26674 | |
| ! | 5785 | 13.0% |
| ' | 5680 | 12.7% |
| , | 4239 | 9.5% |
| ? | 1167 | 2.6% |
| " | 582 | 1.3% |
| … | 148 | 0.3% |
| : | 140 | 0.3% |
| & | 84 | 0.2% |
| * | 42 | 0.1% |
| Other values (7) | 102 | 0.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 802 | |
| 1 | 516 | |
| 2 | 299 | 11.1% |
| 9 | 208 | 7.7% |
| 3 | 208 | 7.7% |
| 5 | 168 | 6.3% |
| 4 | 140 | 5.2% |
| 7 | 121 | 4.5% |
| 6 | 121 | 4.5% |
| 8 | 104 | 3.9% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 5 | |
| = | 5 | |
| | | 2 | 14.3% |
| ~ | 1 | 7.1% |
| − | 1 | 7.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1937 | |
| – | 9 | 0.5% |
| — | 8 | 0.4% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 82 | |
| ” | 15 | 15.3% |
| » | 1 | 1.0% |
Initial Punctuation
| Value | Count | Frequency (%) |
| “ | 14 | |
| ‘ | 4 | 21.1% |
| « | 1 | 5.3% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 49 | |
| [ | 7 | 12.5% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 48 | |
| ] | 7 | 12.7% |
Other Number
| Value | Count | Frequency (%) |
| ½ | 2 | |
| ² | 1 |
Modifier Letter
| Value | Count | Frequency (%) |
| ˌ | 1 | |
| ˈ | 1 |
Space Separator
| Value | Count | Frequency (%) |
| 154023 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 37 |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ் | 1 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 757144 | |
| Common | 203592 | 21.2% |
| Han | 21 | < 0.1% |
| Tamil | 5 | < 0.1% |
| Hiragana | 5 | < 0.1% |
| Katakana | 4 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 94648 | 12.5% |
| t | 57409 | 7.6% |
| o | 56689 | 7.5% |
| a | 51572 | 6.8% |
| n | 47624 | 6.3% |
| i | 46141 | 6.1% |
| r | 45119 | 6.0% |
| s | 42435 | 5.6% |
| h | 37258 | 4.9% |
| l | 30231 | 4.0% |
| Other values (73) | 248018 |
Common
| Value | Count | Frequency (%) |
| 154023 | ||
| . | 26674 | 13.1% |
| ! | 5785 | 2.8% |
| ' | 5680 | 2.8% |
| , | 4239 | 2.1% |
| - | 1937 | 1.0% |
| ? | 1167 | 0.6% |
| 0 | 802 | 0.4% |
| " | 582 | 0.3% |
| 1 | 516 | 0.3% |
| Other values (42) | 2187 | 1.1% |
Han
| Value | Count | Frequency (%) |
| 劇 | 1 | 4.8% |
| 時 | 1 | 4.8% |
| 熟 | 1 | 4.8% |
| 后 | 1 | 4.8% |
| 場 | 1 | 4.8% |
| 版 | 1 | 4.8% |
| 桃 | 1 | 4.8% |
| 舞 | 1 | 4.8% |
| 的 | 1 | 4.8% |
| 最 | 1 | 4.8% |
| Other values (11) | 11 |
Tamil
| Value | Count | Frequency (%) |
| ஆ | 1 | |
| த | 1 | |
| வ | 1 | |
| ன | 1 | |
| ் | 1 |
Hiragana
| Value | Count | Frequency (%) |
| は | 1 | |
| し | 1 | |
| て | 1 | |
| い | 1 | |
| る | 1 |
Katakana
| Value | Count | Frequency (%) |
| ク | 1 | |
| ラ | 1 | |
| ナ | 1 | |
| ド | 1 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 960339 | |
| Punctuation | 280 | < 0.1% |
| None | 112 | < 0.1% |
| CJK | 21 | < 0.1% |
| Tamil | 5 | < 0.1% |
| Hiragana | 5 | < 0.1% |
| Katakana | 4 | < 0.1% |
| IPA Ext | 2 | < 0.1% |
| Modifier Letters | 2 | < 0.1% |
| Math Operators | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 154023 | ||
| e | 94648 | 9.9% |
| t | 57409 | 6.0% |
| o | 56689 | 5.9% |
| a | 51572 | 5.4% |
| n | 47624 | 5.0% |
| i | 46141 | 4.8% |
| r | 45119 | 4.7% |
| s | 42435 | 4.4% |
| h | 37258 | 3.9% |
| Other values (78) | 327421 |
Punctuation
| Value | Count | Frequency (%) |
| … | 148 | |
| ’ | 82 | |
| ” | 15 | 5.4% |
| “ | 14 | 5.0% |
| – | 9 | 3.2% |
| — | 8 | 2.9% |
| ‘ | 4 | 1.4% |
None
| Value | Count | Frequency (%) |
| é | 20 | |
| ä | 16 | |
| ö | 8 | 7.1% |
| ó | 6 | 5.4% |
| á | 6 | 5.4% |
| ü | 5 | 4.5% |
| ı | 5 | 4.5% |
| í | 5 | 4.5% |
| · | 4 | 3.6% |
| ć | 3 | 2.7% |
| Other values (26) | 34 |
IPA Ext
| Value | Count | Frequency (%) |
| ə | 2 |
CJK
| Value | Count | Frequency (%) |
| 劇 | 1 | 4.8% |
| 時 | 1 | 4.8% |
| 熟 | 1 | 4.8% |
| 后 | 1 | 4.8% |
| 場 | 1 | 4.8% |
| 版 | 1 | 4.8% |
| 桃 | 1 | 4.8% |
| 舞 | 1 | 4.8% |
| 的 | 1 | 4.8% |
| 最 | 1 | 4.8% |
| Other values (11) | 11 |
Tamil
| Value | Count | Frequency (%) |
| ஆ | 1 | |
| த | 1 | |
| வ | 1 | |
| ன | 1 | |
| ் | 1 |
Katakana
| Value | Count | Frequency (%) |
| ク | 1 | |
| ラ | 1 | |
| ナ | 1 | |
| ド | 1 |
Modifier Letters
| Value | Count | Frequency (%) |
| ˌ | 1 | |
| ˈ | 1 |
Hiragana
| Value | Count | Frequency (%) |
| は | 1 | |
| し | 1 | |
| て | 1 | |
| い | 1 | |
| る | 1 |
Math Operators
| Value | Count | Frequency (%) |
| − | 1 |
title
Categorical
HIGH CARDINALITY  UNIFORM 
| Distinct | 42277 |
|---|---|
| Distinct (%) | 92.8% |
| Missing | 6 |
| Missing (%) | < 0.1% |
| Memory size | 355.9 KiB |
| Blackout | 13 |
|---|---|
| Cinderella | 11 |
| Alice in Wonderland | 9 |
| Hamlet | 9 |
| Beauty and the Beast | 8 |
| Other values (42272) |
Length
| Max length | 105 |
|---|---|
| Median length | 79 |
| Mean length | 16.707265 |
| Min length | 1 |
Characters and Unicode
| Total characters | 760782 |
|---|---|
| Distinct characters | 287 |
| Distinct categories | 17 ? |
| Distinct scripts | 7 ? |
| Distinct blocks | 12 ? |
Unique
| Unique | 39935 ? |
|---|---|
| Unique (%) | 87.7% |
Sample
| 1st row | Toy Story |
|---|---|
| 2nd row | Jumanji |
| 3rd row | Grumpier Old Men |
| 4th row | Waiting to Exhale |
| 5th row | Father of the Bride Part II |
Common Values
| Value | Count | Frequency (%) |
| Blackout | 13 | < 0.1% |
| Cinderella | 11 | < 0.1% |
| Alice in Wonderland | 9 | < 0.1% |
| Hamlet | 9 | < 0.1% |
| Beauty and the Beast | 8 | < 0.1% |
| King Lear | 8 | < 0.1% |
| Les Misérables | 8 | < 0.1% |
| The Promise | 8 | < 0.1% |
| The Three Musketeers | 7 | < 0.1% |
| A Christmas Carol | 7 | < 0.1% |
| Other values (42267) | 45448 |
Length
| Value | Count | Frequency (%) |
| the | 14593 | 10.7% |
| of | 4952 | 3.6% |
| a | 2251 | 1.6% |
| in | 1697 | 1.2% |
| and | 1640 | 1.2% |
| to | 1057 | 0.8% |
| 766 | 0.6% | |
| man | 665 | 0.5% |
| love | 664 | 0.5% |
| for | 602 | 0.4% |
| Other values (24431) | 107791 |
Most occurring characters
| Value | Count | Frequency (%) |
| 91164 | 12.0% | |
| e | 76538 | 10.1% |
| a | 49135 | 6.5% |
| o | 45856 | 6.0% |
| n | 40986 | 5.4% |
| r | 40160 | 5.3% |
| i | 39898 | 5.2% |
| t | 36835 | 4.8% |
| s | 29641 | 3.9% |
| h | 28607 | 3.8% |
| Other values (277) | 281962 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 536254 | |
| Uppercase Letter | 117663 | 15.5% |
| Space Separator | 91164 | 12.0% |
| Other Punctuation | 10524 | 1.4% |
| Decimal Number | 3873 | 0.5% |
| Dash Punctuation | 990 | 0.1% |
| Close Punctuation | 87 | < 0.1% |
| Open Punctuation | 85 | < 0.1% |
| Final Punctuation | 38 | < 0.1% |
| Other Letter | 25 | < 0.1% |
| Other values (7) | 79 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 76538 | |
| a | 49135 | |
| o | 45856 | 8.6% |
| n | 40986 | 7.6% |
| r | 40160 | 7.5% |
| i | 39898 | 7.4% |
| t | 36835 | 6.9% |
| s | 29641 | 5.5% |
| h | 28607 | 5.3% |
| l | 26042 | 4.9% |
| Other values (121) | 122556 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 16055 | |
| S | 10365 | 8.8% |
| M | 8046 | 6.8% |
| B | 7691 | 6.5% |
| C | 7194 | 6.1% |
| A | 6815 | 5.8% |
| D | 6368 | 5.4% |
| L | 5890 | 5.0% |
| H | 5183 | 4.4% |
| W | 5183 | 4.4% |
| Other values (65) | 38873 |
Other Letter
| Value | Count | Frequency (%) |
| ی | 2 | 8.0% |
| چ | 2 | 8.0% |
| ه | 2 | 8.0% |
| ک | 2 | 8.0% |
| ª | 1 | 4.0% |
| 傳 | 1 | 4.0% |
| 空 | 1 | 4.0% |
| 時 | 1 | 4.0% |
| 狗 | 1 | 4.0% |
| ا | 1 | 4.0% |
| Other values (11) | 11 |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 3735 | |
| ' | 2512 | |
| . | 1604 | |
| , | 1139 | 10.8% |
| ! | 648 | 6.2% |
| & | 460 | 4.4% |
| ? | 269 | 2.6% |
| / | 80 | 0.8% |
| * | 19 | 0.2% |
| # | 13 | 0.1% |
| Other values (8) | 45 | 0.4% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 864 | |
| 1 | 703 | |
| 0 | 619 | |
| 3 | 484 | |
| 9 | 232 | 6.0% |
| 4 | 231 | 6.0% |
| 5 | 227 | 5.9% |
| 7 | 196 | 5.1% |
| 8 | 161 | 4.2% |
| 6 | 156 | 4.0% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 17 | |
| × | 3 | 12.5% |
| = | 1 | 4.2% |
| ∞ | 1 | 4.2% |
| − | 1 | 4.2% |
| → | 1 | 4.2% |
Other Number
| Value | Count | Frequency (%) |
| ½ | 12 | |
| ² | 3 | 15.8% |
| ³ | 2 | 10.5% |
| ⅓ | 1 | 5.3% |
| ⁴ | 1 | 5.3% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 3 | |
| ☆ | 2 | |
| ™ | 1 | 12.5% |
| ♡ | 1 | 12.5% |
| № | 1 | 12.5% |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 18 | |
| ¢ | 2 | 9.5% |
| £ | 1 | 4.8% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 975 | |
| – | 15 | 1.5% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 82 | |
| ] | 5 | 5.7% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 80 | |
| [ | 5 | 5.9% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 37 | |
| ” | 1 | 2.6% |
Initial Punctuation
| Value | Count | Frequency (%) |
| “ | 1 | |
| ‘ | 1 |
Space Separator
| Value | Count | Frequency (%) |
| 91164 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 3 |
Format
| Value | Count | Frequency (%) |
| | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 653387 | |
| Common | 106840 | 14.0% |
| Cyrillic | 361 | < 0.1% |
| Greek | 170 | < 0.1% |
| Arabic | 11 | < 0.1% |
| Katakana | 8 | < 0.1% |
| Han | 5 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 76538 | 11.7% |
| a | 49135 | 7.5% |
| o | 45856 | 7.0% |
| n | 40986 | 6.3% |
| r | 40160 | 6.1% |
| i | 39898 | 6.1% |
| t | 36835 | 5.6% |
| s | 29641 | 4.5% |
| h | 28607 | 4.4% |
| l | 26042 | 4.0% |
| Other values (107) | 239689 |
Common
| Value | Count | Frequency (%) |
| 91164 | ||
| : | 3735 | 3.5% |
| ' | 2512 | 2.4% |
| . | 1604 | 1.5% |
| , | 1139 | 1.1% |
| - | 975 | 0.9% |
| 2 | 864 | 0.8% |
| 1 | 703 | 0.7% |
| ! | 648 | 0.6% |
| 0 | 619 | 0.6% |
| Other values (50) | 2877 | 2.7% |
Cyrillic
| Value | Count | Frequency (%) |
| е | 33 | 9.1% |
| о | 32 | 8.9% |
| а | 32 | 8.9% |
| н | 26 | 7.2% |
| и | 24 | 6.6% |
| р | 23 | 6.4% |
| к | 17 | 4.7% |
| в | 16 | 4.4% |
| с | 15 | 4.2% |
| т | 14 | 3.9% |
| Other values (38) | 129 |
Greek
| Value | Count | Frequency (%) |
| α | 20 | 11.8% |
| ι | 14 | 8.2% |
| ο | 14 | 8.2% |
| τ | 9 | 5.3% |
| λ | 8 | 4.7% |
| ά | 8 | 4.7% |
| ρ | 8 | 4.7% |
| ν | 7 | 4.1% |
| ε | 6 | 3.5% |
| π | 6 | 3.5% |
| Other values (32) | 70 |
Katakana
| Value | Count | Frequency (%) |
| タ | 1 | |
| ン | 1 | |
| ポ | 1 | |
| ィ | 1 | |
| テ | 1 | |
| ス | 1 | |
| ァ | 1 | |
| フ | 1 |
Arabic
| Value | Count | Frequency (%) |
| ی | 2 | |
| چ | 2 | |
| ه | 2 | |
| ک | 2 | |
| ا | 1 | |
| س | 1 | |
| ج | 1 |
Han
| Value | Count | Frequency (%) |
| 傳 | 1 | |
| 空 | 1 | |
| 時 | 1 | |
| 狗 | 1 | |
| 貓 | 1 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 759185 | |
| None | 1141 | 0.1% |
| Cyrillic | 361 | < 0.1% |
| Punctuation | 62 | < 0.1% |
| Arabic | 11 | < 0.1% |
| Katakana | 8 | < 0.1% |
| CJK | 5 | < 0.1% |
| Misc Symbols | 3 | < 0.1% |
| Letterlike Symbols | 2 | < 0.1% |
| Math Operators | 2 | < 0.1% |
| Other values (2) | 2 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 91164 | 12.0% | |
| e | 76538 | 10.1% |
| a | 49135 | 6.5% |
| o | 45856 | 6.0% |
| n | 40986 | 5.4% |
| r | 40160 | 5.3% |
| i | 39898 | 5.3% |
| t | 36835 | 4.9% |
| s | 29641 | 3.9% |
| h | 28607 | 3.8% |
| Other values (76) | 280365 |
None
| Value | Count | Frequency (%) |
| é | 222 | |
| ä | 129 | 11.3% |
| ö | 58 | 5.1% |
| è | 54 | 4.7% |
| ô | 44 | 3.9% |
| ü | 39 | 3.4% |
| ó | 37 | 3.2% |
| á | 35 | 3.1% |
| ı | 35 | 3.1% |
| à | 33 | 2.9% |
| Other values (108) | 455 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 37 | |
| – | 15 | |
| … | 5 | 8.1% |
| | 2 | 3.2% |
| “ | 1 | 1.6% |
| ‘ | 1 | 1.6% |
| ” | 1 | 1.6% |
Cyrillic
| Value | Count | Frequency (%) |
| е | 33 | 9.1% |
| о | 32 | 8.9% |
| а | 32 | 8.9% |
| н | 26 | 7.2% |
| и | 24 | 6.6% |
| р | 23 | 6.4% |
| к | 17 | 4.7% |
| в | 16 | 4.4% |
| с | 15 | 4.2% |
| т | 14 | 3.9% |
| Other values (38) | 129 |
Arabic
| Value | Count | Frequency (%) |
| ی | 2 | |
| چ | 2 | |
| ه | 2 | |
| ک | 2 | |
| ا | 1 | |
| س | 1 | |
| ج | 1 |
Misc Symbols
| Value | Count | Frequency (%) |
| ☆ | 2 | |
| ♡ | 1 |
CJK
| Value | Count | Frequency (%) |
| 傳 | 1 | |
| 空 | 1 | |
| 時 | 1 | |
| 狗 | 1 | |
| 貓 | 1 |
Number Forms
| Value | Count | Frequency (%) |
| ⅓ | 1 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ™ | 1 | |
| № | 1 |
Katakana
| Value | Count | Frequency (%) |
| タ | 1 | |
| ン | 1 | |
| ポ | 1 | |
| ィ | 1 | |
| テ | 1 | |
| ス | 1 | |
| ァ | 1 | |
| フ | 1 |
Math Operators
| Value | Count | Frequency (%) |
| ∞ | 1 | |
| − | 1 |
Arrows
| Value | Count | Frequency (%) |
| → | 1 |
vote_average
Real number (ℝ)
| Distinct | 92 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 6 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.6181087 |
| Minimum | 0 |
|---|---|
| Maximum | 10 |
| Zeros | 3005 |
| Zeros (%) | 6.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 355.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 5 |
| median | 6 |
| Q3 | 6.8 |
| 95-th percentile | 7.8 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 1.8 |
Descriptive statistics
| Standard deviation | 1.9243622 |
|---|---|
| Coefficient of variation (CV) | 0.34252848 |
| Kurtosis | 2.5009355 |
| Mean | 5.6181087 |
| Median Absolute Deviation (MAD) | 0.9 |
| Skewness | -1.5193805 |
| Sum | 255826.2 |
| Variance | 3.70317 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3005 | 6.6% |
| 6 | 2471 | 5.4% |
| 5 | 2009 | 4.4% |
| 7 | 1888 | 4.1% |
| 6.5 | 1722 | 3.8% |
| 6.3 | 1605 | 3.5% |
| 5.5 | 1383 | 3.0% |
| 5.8 | 1370 | 3.0% |
| 6.4 | 1354 | 3.0% |
| 6.7 | 1351 | 3.0% |
| Other values (82) | 27378 |
| Value | Count | Frequency (%) |
| 0 | 3005 | |
| 0.5 | 13 | < 0.1% |
| 0.7 | 1 | < 0.1% |
| 1 | 105 | 0.2% |
| 1.1 | 1 | < 0.1% |
| 1.2 | 4 | < 0.1% |
| 1.3 | 13 | < 0.1% |
| 1.4 | 5 | < 0.1% |
| 1.5 | 30 | 0.1% |
| 1.6 | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 10 | 190 | |
| 9.8 | 1 | < 0.1% |
| 9.6 | 1 | < 0.1% |
| 9.5 | 18 | < 0.1% |
| 9.4 | 3 | < 0.1% |
| 9.3 | 18 | < 0.1% |
| 9.2 | 4 | < 0.1% |
| 9.1 | 3 | < 0.1% |
| 9 | 160 | |
| 8.9 | 7 | < 0.1% |
vote_count
Real number (ℝ)
| Distinct | 1820 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 6 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 109.78872 |
| Minimum | 0 |
|---|---|
| Maximum | 14075 |
| Zeros | 2906 |
| Zeros (%) | 6.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 355.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 3 |
| median | 10 |
| Q3 | 34 |
| 95-th percentile | 433 |
| Maximum | 14075 |
| Range | 14075 |
| Interquartile range (IQR) | 31 |
Descriptive statistics
| Standard deviation | 490.91574 |
|---|---|
| Coefficient of variation (CV) | 4.471459 |
| Kurtosis | 151.45026 |
| Mean | 109.78872 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 10.458626 |
| Sum | 4999339 |
| Variance | 240998.27 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 3268 | 7.2% |
| 2 | 3133 | 6.9% |
| 0 | 2906 | 6.4% |
| 3 | 2799 | 6.1% |
| 4 | 2482 | 5.4% |
| 5 | 2099 | 4.6% |
| 6 | 1747 | 3.8% |
| 7 | 1574 | 3.5% |
| 8 | 1360 | 3.0% |
| 9 | 1195 | 2.6% |
| Other values (1810) | 22973 |
| Value | Count | Frequency (%) |
| 0 | 2906 | |
| 1 | 3268 | |
| 2 | 3133 | |
| 3 | 2799 | |
| 4 | 2482 | |
| 5 | 2099 | |
| 6 | 1747 | |
| 7 | 1574 | |
| 8 | 1360 | |
| 9 | 1195 | 2.6% |
| Value | Count | Frequency (%) |
| 14075 | 1 | |
| 12269 | 1 | |
| 12114 | 1 | |
| 12000 | 1 | |
| 11444 | 1 | |
| 11187 | 1 | |
| 10297 | 1 | |
| 10014 | 1 | |
| 9678 | 1 | |
| 9634 | 1 |
release_year
Real number (ℝ)
| Distinct | 135 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 90 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1991.8826 |
| Minimum | 1874 |
|---|---|
| Maximum | 2020 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 355.9 KiB |
Quantile statistics
| Minimum | 1874 |
|---|---|
| 5-th percentile | 1941 |
| Q1 | 1978 |
| median | 2001 |
| Q3 | 2010 |
| 95-th percentile | 2015 |
| Maximum | 2020 |
| Range | 146 |
| Interquartile range (IQR) | 32 |
Descriptive statistics
| Standard deviation | 24.05775 |
|---|---|
| Coefficient of variation (CV) | 0.012077896 |
| Kurtosis | 0.84069867 |
| Mean | 1991.8826 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | -1.2253957 |
| Sum | 90535047 |
| Variance | 578.77535 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2014 | 1976 | 4.3% |
| 2015 | 1907 | 4.2% |
| 2013 | 1895 | 4.2% |
| 2012 | 1727 | 3.8% |
| 2011 | 1669 | 3.7% |
| 2016 | 1604 | 3.5% |
| 2009 | 1591 | 3.5% |
| 2010 | 1501 | 3.3% |
| 2008 | 1482 | 3.3% |
| 2007 | 1322 | 2.9% |
| Other values (125) | 28778 |
| Value | Count | Frequency (%) |
| 1874 | 1 | < 0.1% |
| 1878 | 1 | < 0.1% |
| 1883 | 1 | < 0.1% |
| 1887 | 1 | < 0.1% |
| 1888 | 2 | < 0.1% |
| 1890 | 5 | < 0.1% |
| 1891 | 6 | |
| 1892 | 3 | < 0.1% |
| 1893 | 1 | < 0.1% |
| 1894 | 13 |
| Value | Count | Frequency (%) |
| 2020 | 1 | < 0.1% |
| 2018 | 5 | < 0.1% |
| 2017 | 532 | 1.2% |
| 2016 | 1604 | |
| 2015 | 1907 | |
| 2014 | 1976 | |
| 2013 | 1895 | |
| 2012 | 1727 | |
| 2011 | 1669 | |
| 2010 | 1501 |
return
Real number (ℝ)
INFINITE  MISSING  ZEROS 
| Distinct | 5233 |
|---|---|
| Distinct (%) | 47.8% |
| Missing | 34592 |
| Missing (%) | 76.0% |
| Infinite | 2035 |
| Infinite (%) | 4.5% |
| Mean | inf |
| Minimum | 0 |
|---|---|
| Maximum | inf |
| Zeros | 3522 |
| Zeros (%) | 7.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 355.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.2728531 |
| Q3 | 7.2645927 |
| 95-th percentile | nan |
| Maximum | inf |
| Range | inf |
| Interquartile range (IQR) | 7.2645927 |
Descriptive statistics
| Standard deviation | nan |
|---|---|
| Coefficient of variation (CV) | nan |
| Kurtosis | nan |
| Mean | inf |
| Median Absolute Deviation (MAD) | 1.2728531 |
| Skewness | nan |
| Sum | inf |
| Variance | nan |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3522 | 7.7% |
| inf | 2035 | 4.5% |
| 1 | 20 | < 0.1% |
| 2 | 12 | < 0.1% |
| 4 | 11 | < 0.1% |
| 5 | 8 | < 0.1% |
| 3 | 7 | < 0.1% |
| 1.333333333 | 7 | < 0.1% |
| 2.5 | 7 | < 0.1% |
| 1.5 | 6 | < 0.1% |
| Other values (5223) | 5315 | 11.7% |
| (Missing) | 34592 |
| Value | Count | Frequency (%) |
| 0 | 3522 | |
| 5.217391304 × 10-7 | 1 | < 0.1% |
| 7.5 × 10-7 | 1 | < 0.1% |
| 9.375 × 10-7 | 1 | < 0.1% |
| 1.499133126 × 10-6 | 1 | < 0.1% |
| 1.8 × 10-6 | 1 | < 0.1% |
| 1.916666667 × 10-6 | 1 | < 0.1% |
| 3.5 × 10-6 | 1 | < 0.1% |
| 4 × 10-6 | 1 | < 0.1% |
| 5.111111111 × 10-6 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| inf | 2035 | |
| 12396383 | 1 | < 0.1% |
| 8500000 | 1 | < 0.1% |
| 4197476.625 | 1 | < 0.1% |
| 2755584 | 1 | < 0.1% |
| 1018619.283 | 1 | < 0.1% |
| 1000000 | 1 | < 0.1% |
| 26881.72043 | 1 | < 0.1% |
| 12890.38667 | 1 | < 0.1% |
| 5330.33945 | 1 | < 0.1% |
cast
Categorical
| Distinct | 42663 |
|---|---|
| Distinct (%) | 93.7% |
| Missing | 4 |
| Missing (%) | < 0.1% |
| Memory size | 355.9 KiB |
| [] | 2430 |
|---|---|
| ['Georges Méliès'] | 28 |
| ['Louis Theroux'] | 15 |
| ['Mel Blanc'] | 12 |
| ['Petteri Summanen', 'Ismo Kallio', 'Eppu Salminen', 'Irina Björklund', 'Hannu-Pekka Björkman', 'Jenni Banerjee', 'Mikko Leppilampi', 'Lena Meriläinen', 'Mari Perankoski', 'Risto Kaskilahti'] | 9 |
| Other values (42658) |
Length
| Max length | 5099 |
|---|---|
| Median length | 1498 |
| Mean length | 211.89187 |
| Min length | 2 |
Characters and Unicode
| Total characters | 9649132 |
|---|---|
| Distinct characters | 394 |
| Distinct categories | 14 ? |
| Distinct scripts | 9 ? |
| Distinct blocks | 10 ? |
Unique
| Unique | 42456 ? |
|---|---|
| Unique (%) | 93.2% |
Sample
| 1st row | ['Tom Hanks', 'Tim Allen', 'Don Rickles', 'Jim Varney', 'Wallace Shawn', 'John Ratzenberger', 'Annie Potts', 'John Morris', 'Erik von Detten', 'Laurie Metcalf', 'R. Lee Ermey', 'Sarah Freeman', 'Penn Jillette'] |
|---|---|
| 2nd row | ['Robin Williams', 'Jonathan Hyde', 'Kirsten Dunst', 'Bradley Pierce', 'Bonnie Hunt', 'Bebe Neuwirth', 'David Alan Grier', 'Patricia Clarkson', 'Adam Hann-Byrd', 'Laura Bell Bundy', 'James Handy', 'Gillian Barber', 'Brandon Obray', 'Cyrus Thiedeke', 'Gary Joseph Thorup', 'Leonard Zola', 'Lloyd Berry', 'Malcolm Stewart', 'Annabel Kershaw', 'Darryl Henriques', 'Robyn Driscoll', 'Peter Bryant', 'Sarah Gilson', 'Florica Vlad', 'June Lion', 'Brenda Lockmuller'] |
| 3rd row | ['Walter Matthau', 'Jack Lemmon', 'Ann-Margret', 'Sophia Loren', 'Daryl Hannah', 'Burgess Meredith', 'Kevin Pollak'] |
| 4th row | ['Whitney Houston', 'Angela Bassett', 'Loretta Devine', 'Lela Rochon', 'Gregory Hines', 'Dennis Haysbert', 'Michael Beach', 'Mykelti Williamson', 'Lamont Johnson', 'Wesley Snipes'] |
| 5th row | ['Steve Martin', 'Diane Keaton', 'Martin Short', 'Kimberly Williams-Paisley', 'George Newbern', 'Kieran Culkin', 'BD Wong', 'Peter Michael Goetz', 'Kate McGregor-Stewart', 'Jane Adams', 'Eugene Levy', 'Lori Alan'] |
Common Values
| Value | Count | Frequency (%) |
| [] | 2430 | 5.3% |
| ['Georges Méliès'] | 28 | 0.1% |
| ['Louis Theroux'] | 15 | < 0.1% |
| ['Mel Blanc'] | 12 | < 0.1% |
| ['Petteri Summanen', 'Ismo Kallio', 'Eppu Salminen', 'Irina Björklund', 'Hannu-Pekka Björkman', 'Jenni Banerjee', 'Mikko Leppilampi', 'Lena Meriläinen', 'Mari Perankoski', 'Risto Kaskilahti'] | 9 | < 0.1% |
| ['Jimmy Carr'] | 9 | < 0.1% |
| ['David Attenborough'] | 8 | < 0.1% |
| ['Louis C.K.'] | 8 | < 0.1% |
| ['George Carlin'] | 8 | < 0.1% |
| ['Werner Herzog'] | 8 | < 0.1% |
| Other values (42653) | 43003 |
Length
| Value | Count | Frequency (%) |
| john | 9723 | 0.8% |
| michael | 7392 | 0.6% |
| david | 6147 | 0.5% |
| james | 5630 | 0.5% |
| robert | 5628 | 0.5% |
| richard | 4402 | 0.4% |
| paul | 4309 | 0.4% |
| peter | 3820 | 0.3% |
| george | 3356 | 0.3% |
| william | 3340 | 0.3% |
| Other values (112107) | 1103904 |
Most occurring characters
| Value | Count | Frequency (%) |
| ' | 1115724 | 11.6% |
| 1112172 | 11.5% | |
| a | 700269 | 7.3% |
| e | 659492 | 6.8% |
| n | 519048 | 5.4% |
| , | 515082 | 5.3% |
| r | 493142 | 5.1% |
| i | 480013 | 5.0% |
| o | 419851 | 4.4% |
| l | 362263 | 3.8% |
| Other values (384) | 3272076 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5607519 | |
| Other Punctuation | 1646903 | 17.1% |
| Uppercase Letter | 1176548 | 12.2% |
| Space Separator | 1112172 | 11.5% |
| Open Punctuation | 45561 | 0.5% |
| Close Punctuation | 45547 | 0.5% |
| Dash Punctuation | 14098 | 0.1% |
| Other Letter | 543 | < 0.1% |
| Decimal Number | 113 | < 0.1% |
| Final Punctuation | 83 | < 0.1% |
| Other values (4) | 45 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 700269 | |
| e | 659492 | |
| n | 519048 | |
| r | 493142 | 8.8% |
| i | 480013 | 8.6% |
| o | 419851 | 7.5% |
| l | 362263 | 6.5% |
| s | 254773 | 4.5% |
| t | 251970 | 4.5% |
| h | 196689 | 3.5% |
| Other values (138) | 1270009 |
Other Letter
| Value | Count | Frequency (%) |
| ا | 32 | 5.9% |
| م | 31 | 5.7% |
| ی | 19 | 3.5% |
| ع | 19 | 3.5% |
| ن | 18 | 3.3% |
| د | 17 | 3.1% |
| ر | 17 | 3.1% |
| 松 | 17 | 3.1% |
| ي | 16 | 2.9% |
| 美 | 12 | 2.2% |
| Other values (104) | 345 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 108635 | 9.2% |
| S | 91734 | 7.8% |
| C | 83023 | 7.1% |
| J | 82744 | 7.0% |
| B | 81492 | 6.9% |
| A | 69895 | 5.9% |
| R | 66833 | 5.7% |
| D | 64308 | 5.5% |
| L | 60860 | 5.2% |
| G | 54401 | 4.6% |
| Other values (81) | 412623 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 1115724 | |
| , | 515082 | |
| . | 15881 | 1.0% |
| " | 127 | < 0.1% |
| \ | 62 | < 0.1% |
| · | 9 | < 0.1% |
| : | 6 | < 0.1% |
| & | 6 | < 0.1% |
| ! | 5 | < 0.1% |
| / | 1 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 44 | |
| 5 | 37 | |
| 2 | 14 | 12.4% |
| 1 | 7 | 6.2% |
| 9 | 3 | 2.7% |
| 4 | 2 | 1.8% |
| 3 | 2 | 1.8% |
| 7 | 2 | 1.8% |
| 8 | 1 | 0.9% |
| 6 | 1 | 0.9% |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ́ | 10 | |
| ิ | 2 | 11.8% |
| ี | 1 | 5.9% |
| ์ | 1 | 5.9% |
| ั | 1 | 5.9% |
| ่ | 1 | 5.9% |
| ึ | 1 | 5.9% |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 45538 | |
| „ | 14 | < 0.1% |
| ( | 9 | < 0.1% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 74 | |
| ” | 6 | 7.2% |
| » | 3 | 3.6% |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 45538 | |
| ) | 9 | < 0.1% |
Initial Punctuation
| Value | Count | Frequency (%) |
| “ | 20 | |
| « | 3 | 13.0% |
Space Separator
| Value | Count | Frequency (%) |
| 1112172 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 14098 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 3 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ´ | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6780983 | |
| Common | 2864505 | |
| Cyrillic | 3070 | < 0.1% |
| Han | 276 | < 0.1% |
| Arabic | 241 | < 0.1% |
| Thai | 27 | < 0.1% |
| Greek | 14 | < 0.1% |
| Inherited | 10 | < 0.1% |
| Hangul | 6 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 700269 | 10.3% |
| e | 659492 | 9.7% |
| n | 519048 | 7.7% |
| r | 493142 | 7.3% |
| i | 480013 | 7.1% |
| o | 419851 | 6.2% |
| l | 362263 | 5.3% |
| s | 254773 | 3.8% |
| t | 251970 | 3.7% |
| h | 196689 | 2.9% |
| Other values (163) | 2443473 |
Han
| Value | Count | Frequency (%) |
| 松 | 17 | 6.2% |
| 美 | 12 | 4.3% |
| 龙 | 11 | 4.0% |
| 田 | 11 | 4.0% |
| 长 | 11 | 4.0% |
| 平 | 11 | 4.0% |
| 雅 | 11 | 4.0% |
| 泽 | 11 | 4.0% |
| 森 | 9 | 3.3% |
| 杰 | 9 | 3.3% |
| Other values (55) | 163 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 323 | 10.5% |
| и | 315 | 10.3% |
| о | 233 | 7.6% |
| н | 229 | 7.5% |
| р | 215 | 7.0% |
| е | 174 | 5.7% |
| л | 155 | 5.0% |
| к | 136 | 4.4% |
| т | 115 | 3.7% |
| с | 109 | 3.6% |
| Other values (51) | 1066 |
Common
| Value | Count | Frequency (%) |
| ' | 1115724 | |
| 1112172 | ||
| , | 515082 | |
| ] | 45538 | 1.6% |
| [ | 45538 | 1.6% |
| . | 15881 | 0.6% |
| - | 14098 | 0.5% |
| " | 127 | < 0.1% |
| ’ | 74 | < 0.1% |
| \ | 62 | < 0.1% |
| Other values (24) | 209 | < 0.1% |
Arabic
| Value | Count | Frequency (%) |
| ا | 32 | |
| م | 31 | |
| ی | 19 | 7.9% |
| ع | 19 | 7.9% |
| ن | 18 | 7.5% |
| د | 17 | 7.1% |
| ر | 17 | 7.1% |
| ي | 16 | 6.6% |
| ل | 9 | 3.7% |
| ب | 8 | 3.3% |
| Other values (18) | 55 |
Thai
| Value | Count | Frequency (%) |
| ร | 2 | 7.4% |
| น | 2 | 7.4% |
| ว | 2 | 7.4% |
| ง | 2 | 7.4% |
| ิ | 2 | 7.4% |
| า | 2 | 7.4% |
| ศ | 1 | 3.7% |
| ี | 1 | 3.7% |
| ์ | 1 | 3.7% |
| ค | 1 | 3.7% |
| Other values (11) | 11 |
Hangul
| Value | Count | Frequency (%) |
| 열 | 1 | |
| 계 | 1 | |
| 강 | 1 | |
| 만 | 1 | |
| 병 | 1 | |
| 조 | 1 |
Greek
| Value | Count | Frequency (%) |
| ν | 6 | |
| Ζ | 2 | 14.3% |
| α | 2 | 14.3% |
| ί | 2 | 14.3% |
| ο | 2 | 14.3% |
Inherited
| Value | Count | Frequency (%) |
| ́ | 10 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 9607234 | |
| None | 38098 | 0.4% |
| Cyrillic | 3070 | < 0.1% |
| CJK | 276 | < 0.1% |
| Arabic | 241 | < 0.1% |
| Punctuation | 114 | < 0.1% |
| Latin Ext Additional | 56 | < 0.1% |
| Thai | 27 | < 0.1% |
| Diacriticals | 10 | < 0.1% |
| Hangul | 6 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| ' | 1115724 | 11.6% |
| 1112172 | 11.6% | |
| a | 700269 | 7.3% |
| e | 659492 | 6.9% |
| n | 519048 | 5.4% |
| , | 515082 | 5.4% |
| r | 493142 | 5.1% |
| i | 480013 | 5.0% |
| o | 419851 | 4.4% |
| l | 362263 | 3.8% |
| Other values (68) | 3230178 |
None
| Value | Count | Frequency (%) |
| é | 9068 | |
| á | 4140 | 10.9% |
| í | 2725 | 7.2% |
| ô | 2294 | 6.0% |
| ö | 2039 | 5.4% |
| ó | 1873 | 4.9% |
| ü | 1497 | 3.9% |
| ć | 1296 | 3.4% |
| è | 1243 | 3.3% |
| ä | 1002 | 2.6% |
| Other values (110) | 10921 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 323 | 10.5% |
| и | 315 | 10.3% |
| о | 233 | 7.6% |
| н | 229 | 7.5% |
| р | 215 | 7.0% |
| е | 174 | 5.7% |
| л | 155 | 5.0% |
| к | 136 | 4.4% |
| т | 115 | 3.7% |
| с | 109 | 3.6% |
| Other values (51) | 1066 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 74 | |
| “ | 20 | 17.5% |
| „ | 14 | 12.3% |
| ” | 6 | 5.3% |
Arabic
| Value | Count | Frequency (%) |
| ا | 32 | |
| م | 31 | |
| ی | 19 | 7.9% |
| ع | 19 | 7.9% |
| ن | 18 | 7.5% |
| د | 17 | 7.1% |
| ر | 17 | 7.1% |
| ي | 16 | 6.6% |
| ل | 9 | 3.7% |
| ب | 8 | 3.3% |
| Other values (18) | 55 |
CJK
| Value | Count | Frequency (%) |
| 松 | 17 | 6.2% |
| 美 | 12 | 4.3% |
| 龙 | 11 | 4.0% |
| 田 | 11 | 4.0% |
| 长 | 11 | 4.0% |
| 平 | 11 | 4.0% |
| 雅 | 11 | 4.0% |
| 泽 | 11 | 4.0% |
| 森 | 9 | 3.3% |
| 杰 | 9 | 3.3% |
| Other values (55) | 163 |
Latin Ext Additional
| Value | Count | Frequency (%) |
| ễ | 15 | |
| ạ | 9 | |
| ỳ | 6 | 10.7% |
| ị | 6 | 10.7% |
| ế | 5 | 8.9% |
| ả | 4 | 7.1% |
| ề | 4 | 7.1% |
| ỗ | 4 | 7.1% |
| ầ | 2 | 3.6% |
| ố | 1 | 1.8% |
Diacriticals
| Value | Count | Frequency (%) |
| ́ | 10 |
Thai
| Value | Count | Frequency (%) |
| ร | 2 | 7.4% |
| น | 2 | 7.4% |
| ว | 2 | 7.4% |
| ง | 2 | 7.4% |
| ิ | 2 | 7.4% |
| า | 2 | 7.4% |
| ศ | 1 | 3.7% |
| ี | 1 | 3.7% |
| ์ | 1 | 3.7% |
| ค | 1 | 3.7% |
| Other values (11) | 11 |
Hangul
| Value | Count | Frequency (%) |
| 열 | 1 | |
| 계 | 1 | |
| 강 | 1 | |
| 만 | 1 | |
| 병 | 1 | |
| 조 | 1 |
crew
Categorical
| Distinct | 42899 |
|---|---|
| Distinct (%) | 94.2% |
| Missing | 4 |
| Missing (%) | < 0.1% |
| Memory size | 355.9 KiB |
| [] | 805 |
|---|---|
| ['Georges Méliès'] | 36 |
| ['Christian I. Nyby II'] | 13 |
| ['Gerald Thomas', 'Talbot Rothwell'] | 13 |
| ['Frederick Wiseman'] | 12 |
| Other values (42894) |
Length
| Max length | 7473 |
|---|---|
| Median length | 2323 |
| Mean length | 178.30596 |
| Min length | 2 |
Characters and Unicode
| Total characters | 8119697 |
|---|---|
| Distinct characters | 359 |
| Distinct categories | 14 ? |
| Distinct scripts | 8 ? |
| Distinct blocks | 9 ? |
Unique
| Unique | 41752 ? |
|---|---|
| Unique (%) | 91.7% |
Sample
| 1st row | ['John Lasseter', 'Joss Whedon', 'Andrew Stanton', 'Joel Cohen', 'Alec Sokolow', 'Bonnie Arnold', 'Ed Catmull', 'Ralph Guggenheim', 'Steve Jobs', 'Lee Unkrich', 'Ralph Eggleston', 'Robert Gordon', 'Mary Helen Leasman', 'Kim Blanchette', 'Marilyn McCoppen', 'Randy Newman', 'Dale E. Grahn', 'Robin Cooper', 'John Lasseter', 'Pete Docter', 'Joe Ranft', 'Patsy Bouge', 'Norm DeCarlo', 'Ash Brannon', 'Randy Newman', 'Roman Figun', 'Don Davis', 'James Flamberg', 'Mary Beth Smith', 'Rick Mackay', 'Susan Bradley', 'William Reeves', 'Randy Newman', 'Andrew Stanton', 'Pete Docter', 'Gary Rydstrom', 'Karen Robert Jackson', 'Chris Montan', 'Rich Quade', 'Michael Berenstein', 'Colin Brady', 'Davey Crockett Feiten', 'Angie Glocka', 'Rex Grignon', 'Tom K. Gurney', 'Jimmy Hayward', 'Hal T. Hickel', 'Karen Kiser', 'Anthony B. LaMolinara', 'Guionne Leroy', 'Bud Luckey', 'Les Major', 'Glenn McQueen', 'Mark Oftedal', 'Jeff Pidgeon', 'Jeff Pratt', 'Steve Rabatich', 'Roger Rose', 'Steve Segal', 'Doug Sheppeck', 'Alan Sperling', 'Doug Sweetland', 'David Tart', 'Ken Willard', 'Thomas Porter', 'Mark Thomas Henne', 'Oren Jacob', 'Darwyn Peachey', 'Mitch Prater', 'Brian M. Rosen', 'Sharon Calahan', 'Galyn Susman', 'William Cone', 'Shelley Daniels Lekven', 'Bob Pauley', 'Bud Luckey', 'Andrew Stanton', 'William Cone', 'Steve Johnson', 'Dan Haskett', 'Tom Holloway', 'Jean Gillmore', 'Desirée Mourad', 'Sonoko Konishi', 'Ann M. Rockwell', 'Julie M. McDonald', 'Robin Lee', 'Tom Freeman', 'Ada Cochavi', 'Dana Mulligan', 'Deirdre Morrison', 'Lori Lombardo', 'Ellen Devine', 'Lauren Beth Strogoff', 'Gary Rydstrom', 'Gary Summers', 'Tim Holland', 'Pat Jackson', 'Tom Myers', 'J.R. Grubbs', 'Susan Sanford', 'Susan Popovic', 'Dan Engstrom', 'Ruth Lambert', 'Mickie McGowan'] |
|---|---|
| 2nd row | ['Larry J. Franco', 'Jonathan Hensleigh', 'James Horner', 'Joe Johnston', 'Robert Dalva', 'Nancy Foy', 'Kyle Balda', 'James D. Bissell', 'Scott Kroopf', 'Ted Field', 'Robert W. Cort', 'Thomas E. Ackerman', 'Chris van Allsburg', 'William Teitler', 'Greg Taylor', 'Jim Strain'] |
| 3rd row | ['Howard Deutch', 'Mark Steven Johnson', 'Mark Steven Johnson', 'Jack Keller'] |
| 4th row | ['Forest Whitaker', 'Ronald Bass', 'Ronald Bass', 'Ezra Swerdlow', 'Deborah Schindler', 'Terry McMillan', 'Terry McMillan', 'Terry McMillan', 'Kenneth Edmonds', 'Caron K'] |
| 5th row | ['Alan Silvestri', 'Elliot Davis', 'Nancy Meyers', 'Nancy Meyers', 'Albert Hackett', 'Charles Shyer', 'Adam Bernardi'] |
Common Values
| Value | Count | Frequency (%) |
| [] | 805 | 1.8% |
| ['Georges Méliès'] | 36 | 0.1% |
| ['Christian I. Nyby II'] | 13 | < 0.1% |
| ['Gerald Thomas', 'Talbot Rothwell'] | 13 | < 0.1% |
| ['Frederick Wiseman'] | 12 | < 0.1% |
| ['Charlie Chaplin', 'Charlie Chaplin'] | 12 | < 0.1% |
| ['JP Siili', 'JP Siili'] | 10 | < 0.1% |
| ['Stan Brakhage'] | 10 | < 0.1% |
| ['James Benning'] | 10 | < 0.1% |
| ['William K.L. Dickson ', 'William Heise'] | 9 | < 0.1% |
| Other values (42889) | 44608 |
Length
| Value | Count | Frequency (%) |
| john | 9989 | 1.0% |
| david | 8649 | 0.9% |
| michael | 8201 | 0.8% |
| robert | 6732 | 0.7% |
| james | 4997 | 0.5% |
| paul | 4511 | 0.5% |
| peter | 4495 | 0.5% |
| richard | 4352 | 0.4% |
| mark | 4192 | 0.4% |
| william | 3943 | 0.4% |
| Other values (89565) | 924021 |
Most occurring characters
| Value | Count | Frequency (%) |
| 938608 | 11.6% | |
| ' | 923966 | 11.4% |
| a | 555048 | 6.8% |
| e | 554988 | 6.8% |
| r | 431980 | 5.3% |
| n | 427785 | 5.3% |
| , | 417397 | 5.1% |
| i | 399603 | 4.9% |
| o | 352475 | 4.3% |
| l | 293807 | 3.6% |
| Other values (349) | 2824040 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4697405 | |
| Other Punctuation | 1381886 | 17.0% |
| Uppercase Letter | 1000334 | 12.3% |
| Space Separator | 938608 | 11.6% |
| Open Punctuation | 45549 | 0.6% |
| Close Punctuation | 45549 | 0.6% |
| Dash Punctuation | 10091 | 0.1% |
| Other Letter | 206 | < 0.1% |
| Decimal Number | 51 | < 0.1% |
| Final Punctuation | 10 | < 0.1% |
| Other values (4) | 8 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 555048 | |
| e | 554988 | |
| r | 431980 | |
| n | 427785 | 9.1% |
| i | 399603 | 8.5% |
| o | 352475 | 7.5% |
| l | 293807 | 6.3% |
| s | 220274 | 4.7% |
| t | 215697 | 4.6% |
| h | 169949 | 3.6% |
| Other values (128) | 1075799 |
Other Letter
| Value | Count | Frequency (%) |
| ا | 9 | 4.4% |
| 진 | 8 | 3.9% |
| 이 | 7 | 3.4% |
| م | 7 | 3.4% |
| 연 | 6 | 2.9% |
| 정 | 6 | 2.9% |
| 아 | 5 | 2.4% |
| د | 5 | 2.4% |
| ی | 4 | 1.9% |
| 성 | 4 | 1.9% |
| Other values (88) | 145 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 90569 | 9.1% |
| S | 83805 | 8.4% |
| J | 74017 | 7.4% |
| B | 68760 | 6.9% |
| C | 66188 | 6.6% |
| R | 60184 | 6.0% |
| A | 59627 | 6.0% |
| D | 56919 | 5.7% |
| L | 50177 | 5.0% |
| G | 48783 | 4.9% |
| Other values (80) | 341305 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 923966 | |
| , | 417397 | |
| . | 40081 | 2.9% |
| \ | 384 | < 0.1% |
| " | 38 | < 0.1% |
| & | 8 | < 0.1% |
| ! | 4 | < 0.1% |
| / | 3 | < 0.1% |
| : | 2 | < 0.1% |
| · | 2 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 16 | |
| 0 | 12 | |
| 9 | 7 | |
| 8 | 5 | 9.8% |
| 3 | 4 | 7.8% |
| 7 | 3 | 5.9% |
| 2 | 2 | 3.9% |
| 1 | 2 | 3.9% |
Open Punctuation
| Value | Count | Frequency (%) |
| [ | 45538 | |
| ( | 11 | < 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ] | 45538 | |
| ) | 11 | < 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 10088 | |
| – | 3 | < 0.1% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 8 | |
| ” | 2 | 20.0% |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ́ | 2 | |
| ̃ | 2 |
Space Separator
| Value | Count | Frequency (%) |
| 938608 |
Initial Punctuation
| Value | Count | Frequency (%) |
| “ | 2 |
Math Symbol
| Value | Count | Frequency (%) |
| | | 1 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ´ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5696699 | |
| Common | 2421749 | |
| Cyrillic | 1006 | < 0.1% |
| Hangul | 133 | < 0.1% |
| Arabic | 52 | < 0.1% |
| Greek | 33 | < 0.1% |
| Han | 21 | < 0.1% |
| Inherited | 4 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 555048 | 9.7% |
| e | 554988 | 9.7% |
| r | 431980 | 7.6% |
| n | 427785 | 7.5% |
| i | 399603 | 7.0% |
| o | 352475 | 6.2% |
| l | 293807 | 5.2% |
| s | 220274 | 3.9% |
| t | 215697 | 3.8% |
| h | 169949 | 3.0% |
| Other values (145) | 2075093 |
Hangul
| Value | Count | Frequency (%) |
| 진 | 8 | 6.0% |
| 이 | 7 | 5.3% |
| 연 | 6 | 4.5% |
| 정 | 6 | 4.5% |
| 아 | 5 | 3.8% |
| 성 | 4 | 3.0% |
| 조 | 4 | 3.0% |
| 모 | 4 | 3.0% |
| 박 | 4 | 3.0% |
| 현 | 4 | 3.0% |
| Other values (58) | 81 |
Cyrillic
| Value | Count | Frequency (%) |
| и | 116 | 11.5% |
| а | 92 | 9.1% |
| р | 72 | 7.2% |
| о | 66 | 6.6% |
| е | 58 | 5.8% |
| л | 56 | 5.6% |
| к | 54 | 5.4% |
| н | 54 | 5.4% |
| с | 45 | 4.5% |
| в | 44 | 4.4% |
| Other values (42) | 349 |
Common
| Value | Count | Frequency (%) |
| 938608 | ||
| ' | 923966 | |
| , | 417397 | |
| [ | 45538 | 1.9% |
| ] | 45538 | 1.9% |
| . | 40081 | 1.7% |
| - | 10088 | 0.4% |
| \ | 384 | < 0.1% |
| " | 38 | < 0.1% |
| 5 | 16 | < 0.1% |
| Other values (22) | 95 | < 0.1% |
Greek
| Value | Count | Frequency (%) |
| ς | 4 | 12.1% |
| η | 3 | 9.1% |
| α | 3 | 9.1% |
| Γ | 2 | 6.1% |
| Α | 2 | 6.1% |
| ρ | 2 | 6.1% |
| ι | 2 | 6.1% |
| ά | 2 | 6.1% |
| μ | 2 | 6.1% |
| Φ | 1 | 3.0% |
| Other values (10) | 10 |
Arabic
| Value | Count | Frequency (%) |
| ا | 9 | |
| م | 7 | |
| د | 5 | |
| ی | 4 | |
| ع | 4 | |
| ي | 4 | |
| ن | 3 | 5.8% |
| ح | 3 | 5.8% |
| ل | 3 | 5.8% |
| و | 2 | 3.8% |
| Other values (7) | 8 |
Han
| Value | Count | Frequency (%) |
| 塩 | 2 | |
| 谷 | 2 | |
| 直 | 2 | |
| 義 | 2 | |
| 玛 | 2 | |
| 莫 | 2 | |
| 森 | 2 | |
| 杰 | 2 | |
| 誠 | 1 | 4.8% |
| 中 | 1 | 4.8% |
| Other values (3) | 3 |
Inherited
| Value | Count | Frequency (%) |
| ́ | 2 | |
| ̃ | 2 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8090386 | |
| None | 28073 | 0.3% |
| Cyrillic | 1006 | < 0.1% |
| Hangul | 133 | < 0.1% |
| Arabic | 52 | < 0.1% |
| CJK | 21 | < 0.1% |
| Punctuation | 15 | < 0.1% |
| Latin Ext Additional | 7 | < 0.1% |
| Diacriticals | 4 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 938608 | 11.6% | |
| ' | 923966 | 11.4% |
| a | 555048 | 6.9% |
| e | 554988 | 6.9% |
| r | 431980 | 5.3% |
| n | 427785 | 5.3% |
| , | 417397 | 5.2% |
| i | 399603 | 4.9% |
| o | 352475 | 4.4% |
| l | 293807 | 3.6% |
| Other values (67) | 2794729 |
None
| Value | Count | Frequency (%) |
| é | 7417 | |
| á | 3259 | |
| í | 2067 | 7.4% |
| ó | 1796 | 6.4% |
| ö | 1661 | 5.9% |
| ô | 1407 | 5.0% |
| ü | 964 | 3.4% |
| è | 902 | 3.2% |
| ç | 839 | 3.0% |
| ä | 776 | 2.8% |
| Other values (113) | 6985 |
Cyrillic
| Value | Count | Frequency (%) |
| и | 116 | 11.5% |
| а | 92 | 9.1% |
| р | 72 | 7.2% |
| о | 66 | 6.6% |
| е | 58 | 5.8% |
| л | 56 | 5.6% |
| к | 54 | 5.4% |
| н | 54 | 5.4% |
| с | 45 | 4.5% |
| в | 44 | 4.4% |
| Other values (42) | 349 |
Arabic
| Value | Count | Frequency (%) |
| ا | 9 | |
| م | 7 | |
| د | 5 | |
| ی | 4 | |
| ع | 4 | |
| ي | 4 | |
| ن | 3 | 5.8% |
| ح | 3 | 5.8% |
| ل | 3 | 5.8% |
| و | 2 | 3.8% |
| Other values (7) | 8 |
Hangul
| Value | Count | Frequency (%) |
| 진 | 8 | 6.0% |
| 이 | 7 | 5.3% |
| 연 | 6 | 4.5% |
| 정 | 6 | 4.5% |
| 아 | 5 | 3.8% |
| 성 | 4 | 3.0% |
| 조 | 4 | 3.0% |
| 모 | 4 | 3.0% |
| 박 | 4 | 3.0% |
| 현 | 4 | 3.0% |
| Other values (58) | 81 |
Punctuation
| Value | Count | Frequency (%) |
| ’ | 8 | |
| – | 3 | 20.0% |
| “ | 2 | 13.3% |
| ” | 2 | 13.3% |
Latin Ext Additional
| Value | Count | Frequency (%) |
| ễ | 5 | |
| ạ | 1 | 14.3% |
| ấ | 1 | 14.3% |
CJK
| Value | Count | Frequency (%) |
| 塩 | 2 | |
| 谷 | 2 | |
| 直 | 2 | |
| 義 | 2 | |
| 玛 | 2 | |
| 莫 | 2 | |
| 森 | 2 | |
| 杰 | 2 | |
| 誠 | 1 | 4.8% |
| 中 | 1 | 4.8% |
| Other values (3) | 3 |
Diacriticals
| Value | Count | Frequency (%) |
| ́ | 2 | |
| ̃ | 2 |
| belongs_to_collection | budget | genres | id | original_language | overview | popularity | poster_path | production_companies | production_countries | release_date | revenue | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | release_year | return | cast | crew | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ['Toy Story Collection'] | 30000000.0 | ['Animation', 'Comedy', 'Family'] | 862 | en | Led by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences. | 21.946943 | /rhIRbceoE9lR4veEXuwCC2wARtG.jpg | ['Pixar Animation Studios'] | ['United States of America'] | 1995-10-30 | 373554033.0 | 81.0 | ['English'] | Released | NaN | Toy Story | 7.7 | 5415.0 | 1995.0 | 12.451801 | ['Tom Hanks', 'Tim Allen', 'Don Rickles', 'Jim Varney', 'Wallace Shawn', 'John Ratzenberger', 'Annie Potts', 'John Morris', 'Erik von Detten', 'Laurie Metcalf', 'R. Lee Ermey', 'Sarah Freeman', 'Penn Jillette'] | ['John Lasseter', 'Joss Whedon', 'Andrew Stanton', 'Joel Cohen', 'Alec Sokolow', 'Bonnie Arnold', 'Ed Catmull', 'Ralph Guggenheim', 'Steve Jobs', 'Lee Unkrich', 'Ralph Eggleston', 'Robert Gordon', 'Mary Helen Leasman', 'Kim Blanchette', 'Marilyn McCoppen', 'Randy Newman', 'Dale E. Grahn', 'Robin Cooper', 'John Lasseter', 'Pete Docter', 'Joe Ranft', 'Patsy Bouge', 'Norm DeCarlo', 'Ash Brannon', 'Randy Newman', 'Roman Figun', 'Don Davis', 'James Flamberg', 'Mary Beth Smith', 'Rick Mackay', 'Susan Bradley', 'William Reeves', 'Randy Newman', 'Andrew Stanton', 'Pete Docter', 'Gary Rydstrom', 'Karen Robert Jackson', 'Chris Montan', 'Rich Quade', 'Michael Berenstein', 'Colin Brady', 'Davey Crockett Feiten', 'Angie Glocka', 'Rex Grignon', 'Tom K. Gurney', 'Jimmy Hayward', 'Hal T. Hickel', 'Karen Kiser', 'Anthony B. LaMolinara', 'Guionne Leroy', 'Bud Luckey', 'Les Major', 'Glenn McQueen', 'Mark Oftedal', 'Jeff Pidgeon', 'Jeff Pratt', 'Steve Rabatich', 'Roger Rose', 'Steve Segal', 'Doug Sheppeck', 'Alan Sperling', 'Doug Sweetland', 'David Tart', 'Ken Willard', 'Thomas Porter', 'Mark Thomas Henne', 'Oren Jacob', 'Darwyn Peachey', 'Mitch Prater', 'Brian M. Rosen', 'Sharon Calahan', 'Galyn Susman', 'William Cone', 'Shelley Daniels Lekven', 'Bob Pauley', 'Bud Luckey', 'Andrew Stanton', 'William Cone', 'Steve Johnson', 'Dan Haskett', 'Tom Holloway', 'Jean Gillmore', 'Desirée Mourad', 'Sonoko Konishi', 'Ann M. Rockwell', 'Julie M. McDonald', 'Robin Lee', 'Tom Freeman', 'Ada Cochavi', 'Dana Mulligan', 'Deirdre Morrison', 'Lori Lombardo', 'Ellen Devine', 'Lauren Beth Strogoff', 'Gary Rydstrom', 'Gary Summers', 'Tim Holland', 'Pat Jackson', 'Tom Myers', 'J.R. Grubbs', 'Susan Sanford', 'Susan Popovic', 'Dan Engstrom', 'Ruth Lambert', 'Mickie McGowan'] |
| 1 | NaN | 65000000.0 | ['Adventure', 'Fantasy', 'Family'] | 8844 | en | When siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures. | 17.015539 | /vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg | ['TriStar Pictures', 'Teitler Film', 'Interscope Communications'] | ['United States of America'] | 1995-12-15 | 262797249.0 | 104.0 | ['English', 'Français'] | Released | Roll the dice and unleash the excitement! | Jumanji | 6.9 | 2413.0 | 1995.0 | 4.043035 | ['Robin Williams', 'Jonathan Hyde', 'Kirsten Dunst', 'Bradley Pierce', 'Bonnie Hunt', 'Bebe Neuwirth', 'David Alan Grier', 'Patricia Clarkson', 'Adam Hann-Byrd', 'Laura Bell Bundy', 'James Handy', 'Gillian Barber', 'Brandon Obray', 'Cyrus Thiedeke', 'Gary Joseph Thorup', 'Leonard Zola', 'Lloyd Berry', 'Malcolm Stewart', 'Annabel Kershaw', 'Darryl Henriques', 'Robyn Driscoll', 'Peter Bryant', 'Sarah Gilson', 'Florica Vlad', 'June Lion', 'Brenda Lockmuller'] | ['Larry J. Franco', 'Jonathan Hensleigh', 'James Horner', 'Joe Johnston', 'Robert Dalva', 'Nancy Foy', 'Kyle Balda', 'James D. Bissell', 'Scott Kroopf', 'Ted Field', 'Robert W. Cort', 'Thomas E. Ackerman', 'Chris van Allsburg', 'William Teitler', 'Greg Taylor', 'Jim Strain'] |
| 2 | ['Grumpy Old Men Collection'] | 0.0 | ['Romance', 'Comedy'] | 15602 | en | A family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max. | 11.7129 | /6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg | ['Warner Bros.', 'Lancaster Gate'] | ['United States of America'] | 1995-12-22 | 0.0 | 101.0 | ['English'] | Released | Still Yelling. Still Fighting. Still Ready for Love. | Grumpier Old Men | 6.5 | 92.0 | 1995.0 | NaN | ['Walter Matthau', 'Jack Lemmon', 'Ann-Margret', 'Sophia Loren', 'Daryl Hannah', 'Burgess Meredith', 'Kevin Pollak'] | ['Howard Deutch', 'Mark Steven Johnson', 'Mark Steven Johnson', 'Jack Keller'] |
| 3 | NaN | 16000000.0 | ['Comedy', 'Drama', 'Romance'] | 31357 | en | Cheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe. | 3.859495 | /16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg | ['Twentieth Century Fox Film Corporation'] | ['United States of America'] | 1995-12-22 | 81452156.0 | 127.0 | ['English'] | Released | Friends are the people who let you be yourself... and never let you forget it. | Waiting to Exhale | 6.1 | 34.0 | 1995.0 | 5.090760 | ['Whitney Houston', 'Angela Bassett', 'Loretta Devine', 'Lela Rochon', 'Gregory Hines', 'Dennis Haysbert', 'Michael Beach', 'Mykelti Williamson', 'Lamont Johnson', 'Wesley Snipes'] | ['Forest Whitaker', 'Ronald Bass', 'Ronald Bass', 'Ezra Swerdlow', 'Deborah Schindler', 'Terry McMillan', 'Terry McMillan', 'Terry McMillan', 'Kenneth Edmonds', 'Caron K'] |
| 4 | ['Father of the Bride Collection'] | 0.0 | ['Comedy'] | 11862 | en | Just when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own. | 8.387519 | /e64sOI48hQXyru7naBFyssKFxVd.jpg | ['Sandollar Productions', 'Touchstone Pictures'] | ['United States of America'] | 1995-02-10 | 76578911.0 | 106.0 | ['English'] | Released | Just When His World Is Back To Normal... He's In For The Surprise Of His Life! | Father of the Bride Part II | 5.7 | 173.0 | 1995.0 | inf | ['Steve Martin', 'Diane Keaton', 'Martin Short', 'Kimberly Williams-Paisley', 'George Newbern', 'Kieran Culkin', 'BD Wong', 'Peter Michael Goetz', 'Kate McGregor-Stewart', 'Jane Adams', 'Eugene Levy', 'Lori Alan'] | ['Alan Silvestri', 'Elliot Davis', 'Nancy Meyers', 'Nancy Meyers', 'Albert Hackett', 'Charles Shyer', 'Adam Bernardi'] |
| 5 | NaN | 60000000.0 | ['Action', 'Crime', 'Drama', 'Thriller'] | 949 | en | Obsessive master thief, Neil McCauley leads a top-notch crew on various insane heists throughout Los Angeles while a mentally unstable detective, Vincent Hanna pursues him without rest. Each man recognizes and respects the ability and the dedication of the other even though they are aware their cat-and-mouse game may end in violence. | 17.924927 | /zMyfPUelumio3tiDKPffaUpsQTD.jpg | ['Regency Enterprises', 'Forward Pass', 'Warner Bros.'] | ['United States of America'] | 1995-12-15 | 187436818.0 | 170.0 | ['English', 'Español'] | Released | A Los Angeles Crime Saga | Heat | 7.7 | 1886.0 | 1995.0 | 3.123947 | ['Al Pacino', 'Robert De Niro', 'Val Kilmer', 'Jon Voight', 'Tom Sizemore', 'Diane Venora', 'Amy Brenneman', 'Ashley Judd', 'Mykelti Williamson', 'Natalie Portman', 'Ted Levine', 'Tom Noonan', 'Tone Loc', 'Hank Azaria', 'Wes Studi', 'Dennis Haysbert', 'Danny Trejo', 'Henry Rollins', 'William Fichtner', 'Kevin Gage', 'Susan Traylor', 'Jerry Trimble', 'Ricky Harris', 'Jeremy Piven', 'Xander Berkeley', 'Begonya Plaza', 'Rick Avery', 'Hazelle Goodman', 'Ray Buktenica', 'Max Daniels', 'Vince Deadrick Jr.', 'Steven Ford', 'Farrah Forke', 'Patricia Healy', 'Paul Herman', 'Cindy Katz', 'Brian Libby', 'Dan Martin', 'Mario Roberts', 'Thomas Rosales, Jr.', 'Yvonne Zima', 'Mick Gould', 'Bud Cort', 'Viviane Vives', 'Kim Staunton', 'Martin Ferrero', 'Brad Baldridge', 'Andrew Camuccio', 'Kenny Endoso', 'Kimberly Flynn', 'Niki Harris', 'Bill McIntosh', 'Rick Marzan', 'Terry Miller', 'Kai Soremekun', 'Peter Blackwell', 'Trevor Coppola', 'Mary Kircher', 'Darin Mangan', 'Robert Miranda', 'Manny Perry', 'Iva Franks Singer', 'Tim Werner', 'Philip Ettington'] | ['Michael Mann', 'Michael Mann', 'Art Linson', 'Michael Mann', 'Elliot Goldenthal', 'Dante Spinotti', 'Pasquale Buba', 'William Goldenberg', 'Dov Hoenig', 'Tom Rolf', 'Bonnie Timmermann', 'Neil Spisak', 'Margie Stone McShirley', 'Deborah Lynn Scott', 'Bill Abbott', 'Per Hallberg', 'Terry D. Frazee', 'Paul H. Haines Jr.', 'Neil Krepela', 'Joel Kramer', 'Tony Brubaker', 'Anne H. Ahrens', 'Darryl M. Athons', 'Cate Hardman', 'Jane Brody', 'Donald Frazee', 'Oscar Mazzola', 'Dianne Wager', 'Anthony Lattanzio', 'David Le Vey', 'Leonard Engelman', 'Ilona Herman', 'Vera Mitchell', 'John Caglione Jr.', 'Ken Diaz', 'Neal J. Anderson', 'Duncan Burns', 'Hector C. Gika', 'Larry Kemp', 'Lauren Stephens', 'Gary Jay', 'James Muro', 'Frank Connor', 'Duane Manwiller', 'Chris Moseley', 'Frank Dorowsky', 'Michael Connell', 'Budd Carr', 'Matthew Booth', 'Vicki Hiatt', 'Thomas R. Bryant', 'Ray Boniker', 'Anna Behlmer', 'Ron Bartlett', 'Chris Jenkins', 'Andy Nelson', 'Mark Smith', 'Mick Gould', 'Tim Werner', 'Pieter Jan Brugge', 'Gusmano Cesaretti', 'Arnon Milchan', 'Christopher Cronyn', 'Michael Waxman', 'Alison E. McBryde', 'Marsha Bozeman', 'Jeff Wells', 'Doug Coleman', 'Philip Rogers', 'Jimmy Webb'] |
| 6 | NaN | 58000000.0 | ['Comedy', 'Romance'] | 11860 | en | An ugly duckling having undergone a remarkable change, still harbors feelings for her crush: a carefree playboy, but not before his business-focused brother has something to say about it. | 6.677277 | /jQh15y5YB7bWz1NtffNZmRw0s9D.jpg | ['Paramount Pictures', 'Scott Rudin Productions', 'Mirage Enterprises', 'Sandollar Productions', 'Constellation Entertainment', 'Worldwide', 'Mont Blanc Entertainment GmbH'] | ['Germany', 'United States of America'] | 1995-12-15 | 0.0 | 127.0 | ['Français', 'English'] | Released | You are cordially invited to the most surprising merger of the year. | Sabrina | 6.2 | 141.0 | 1995.0 | 0.000000 | ['Harrison Ford', 'Julia Ormond', 'Greg Kinnear', 'Angie Dickinson', 'Nancy Marchand', 'John Wood', 'Richard Crenna', 'Lauren Holly', 'Dana Ivey', 'Fanny Ardant', 'Patrick Bruel', 'Paul Giamatti', 'Miriam Colón', 'Elizabeth Franz', 'Valérie Lemercier', 'Becky Ann Baker', 'John C. Vennema', 'Margo Martindale', 'J. Smith-Cameron', 'Christine Luneau-Lipton', 'Michael Dees', 'Denis Holmes', 'Jo-Jo Lowe', 'Ira Wheeler', 'Philippa Cooper', 'Ayako Kawahara', 'François Genty', 'Guillaume Gallienne', 'Inés Sastre', 'Phina Oruche', 'Andrea Behalikova', 'Jennifer Herrera', 'Kristina Kumlin', 'Eva Linderholm', 'Carmen Chaplin', 'Micheline Van de Velde', 'Joanna Rhodes', 'Alan Boone', 'Patrick Forster-Delmas', 'Kentaro Matsuo', 'Peter McKernan', 'Ed Connelly', 'Ronald L. Schwary', 'Alvin Lum', 'Siching Song', 'Phil Nee', 'Randy Becker', 'Susan Browning', 'Anthony Mondal', 'Peter Parks', 'Woodrow Asai', 'Eric Bruno Borgman', 'Michael Cline', 'Christopher Del Gaudio', 'Philippe Hartmann', 'Jerry Quinn', 'Dori Rosenthal'] | ['Sydney Pollack', 'Barbara Benedek', 'Sydney Pollack', 'John Williams', 'Fredric Steinkamp', 'Scott Rudin', 'David Rubin', 'Brian Morris', 'David Rayfiel', 'Peter Robb-King', 'Bernadette Mazur', 'Joseph A. Campayno', 'Lynda Gurasich', 'Stephen G. Bishop', 'Gary Jones', 'Ann Roth', 'George DeTitta Jr.', 'Amy Marshall', 'Miriam Schapiro', 'Danny Michael', 'Adam Jenkins', 'Chris Jenkins', 'Scott Millan', 'Myron Nettinga', 'Mitch Gettleman', 'Joe Earle', 'J. Paul Huntsman', 'Andrew Schmetterling', 'Adam Sawelson', 'Barbara Issak', 'Benjamin Beardwood', 'Mary A. Kelly', 'Myles Aronowitz', 'Brian Hamill', 'Giovanni Fiore Coltellacci', 'Giuseppe Rotunno', 'Kate Dowd', 'Juliet Polcsa', 'Michelle Matland', 'Donna Maloney', 'Karl F. Steinkamp', 'Lindsay Doran', 'Ronald L. Schwary', 'John Kasarda', 'Jean-Pierre Avice', 'Thomas A. Imperato', 'Ronald L. Schwary', 'Bill Kaufman', 'Ronna Kress', 'Sandrine Ageorges', 'Joseph E. Iberti', 'Joanny Carpentier', 'Katherine Kennedy'] |
| 7 | NaN | 0.0 | ['Action', 'Adventure', 'Drama', 'Family'] | 45325 | en | A mischievous young boy, Tom Sawyer, witnesses a murder by the deadly Injun Joe. Tom becomes friends with Huckleberry Finn, a boy with no future and no family. Tom has to choose between honoring a friendship or honoring an oath because the town alcoholic is accused of the murder. Tom and Huck go through several adventures trying to retrieve evidence. | 2.561161 | /sGO5Qa55p7wTu7FJcX4H4xIVKvS.jpg | ['Walt Disney Pictures'] | ['United States of America'] | 1995-12-22 | 0.0 | 97.0 | ['English', 'Deutsch'] | Released | The Original Bad Boys. | Tom and Huck | 5.4 | 45.0 | 1995.0 | NaN | ['Jonathan Taylor Thomas', 'Brad Renfro', 'Rachael Leigh Cook', 'Michael McShane', 'Amy Wright', 'Eric Schweig', 'Tamara Mello'] | ['David Loughery', 'Stephen Sommers', 'Peter Hewitt', 'Mark Twain'] |
| 8 | NaN | 35000000.0 | ['Action', 'Adventure', 'Thriller'] | 9091 | en | International action superstar Jean Claude Van Damme teams with Powers Boothe in a Tension-packed, suspense thriller, set against the back-drop of a Stanley Cup game.Van Damme portrays a father whose daughter is suddenly taken during a championship hockey game. With the captors demanding a billion dollars by game's end, Van Damme frantically sets a plan in motion to rescue his daughter and abort an impending explosion before the final buzzer... | 5.23158 | /eoWvKD60lT95Ss1MYNgVExpo5iU.jpg | ['Universal Pictures', 'Imperial Entertainment', 'Signature Entertainment'] | ['United States of America'] | 1995-12-22 | 64350171.0 | 106.0 | ['English'] | Released | Terror goes into overtime. | Sudden Death | 5.5 | 174.0 | 1995.0 | 1.838576 | ['Jean-Claude Van Damme', 'Powers Boothe', 'Dorian Harewood', 'Raymond J. Barry', 'Ross Malinger', 'Whittni Wright'] | ['Peter Hyams', 'Karen Elise Baldwin', 'Gene Quintano', 'Moshe Diamant', 'Anders P. Jensen', 'Howard Baldwin', 'John Debney', 'Peter Hyams', 'Steven Kemper'] |
| 9 | ['James Bond Collection'] | 58000000.0 | ['Adventure', 'Action', 'Thriller'] | 710 | en | James Bond must unmask the mysterious head of the Janus Syndicate and prevent the leader from utilizing the GoldenEye weapons system to inflict devastating revenge on Britain. | 14.686036 | /5c0ovjT41KnYIHYuF4AWsTe3sKh.jpg | ['United Artists', 'Eon Productions'] | ['United Kingdom', 'United States of America'] | 1995-11-16 | 352194034.0 | 130.0 | ['English', 'Pусский', 'Español'] | Released | No limits. No fears. No substitutes. | GoldenEye | 6.6 | 1194.0 | 1995.0 | 6.072311 | ['Pierce Brosnan', 'Sean Bean', 'Izabella Scorupco', 'Famke Janssen', 'Joe Don Baker', 'Judi Dench', 'Gottfried John', 'Robbie Coltrane', 'Alan Cumming', 'Tchéky Karyo', 'Desmond Llewelyn', 'Samantha Bond', 'Michael Kitchen', 'Serena Gordon', 'Simon Kunz', 'Billy J. Mitchell', 'Constantine Gregory', 'Minnie Driver', 'Michelle Arthur', 'Ravil Isyanov'] | ['Martin Campbell', 'Ian Fleming', 'Jeffrey Caine', 'Bruce Feirstein', 'Barbara Broccoli', 'Tom Pevsner', 'Eric Serra', 'Tina Turner', 'Phil Meheux', 'Terry Rawlings', 'Debbie McWilliams', 'Peter Lamont', 'Andrew Ackland-Snow', 'Kathrin Brunner', 'Charles Dwight Lee', 'Michael Ford', 'Lindy Hemming', 'Michael G. Wilson', 'Anthony Waye', 'Michael France', 'Michael Boone', 'Steven Lawrence', 'Tony Graysmark', 'Neil Lamont', 'Pam Dixon', 'Robert Hathaway', 'Charles Bodycomb', 'June Randall', 'Harvey Harrison', 'Roger Pearce', 'Herbert Raditschnig', 'Tim Wooster', 'Keith Hamshere', 'George Whitear', 'Bill Pochetty', 'Luigi Bisioli', 'Steve Foster', 'Chris Corbould', 'Mara Bryan', 'Tim Grover', 'Peter Musgrave', 'Michael A. Carter', 'Graham V. Hartstone', 'John Hayward', 'Jim Shields', 'David John'] |
| belongs_to_collection | budget | genres | id | original_language | overview | popularity | poster_path | production_companies | production_countries | release_date | revenue | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | release_year | return | cast | crew | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 45532 | NaN | 0.0 | ['Horror', 'Mystery', 'Thriller'] | 84419 | en | An unsuccessful sculptor saves a madman named "The Creeper" from drowning. Seeing an opportunity for revenge, he tricks the psycho into murdering his critics. | 0.222814 | /yMnq9mL5uYxbRgwKqyz1cVGCJYJ.jpg | ['Universal Pictures'] | ['United States of America'] | 1946-03-29 | 0.0 | 65.0 | ['English'] | Released | Meet...The CREEPER! | House of Horrors | 6.3 | 8.0 | 1946.0 | NaN | ['Rondo Hatton', 'Robert Lowery', 'Virginia Grey', 'Bill Goodwin', 'Martin Kosleck', 'Alan Napier', 'Howard Freeman', 'Virginia Christine', 'Joan Shawlee', 'Byron Foulger', 'Syd Saylor'] | ['Russell A. Gausman', 'John B. Goodman', 'Jack P. Pierce', 'Philip Cahn', 'Jean Yarbrough', 'George Bricker', 'Maury Gertsman', 'Dwight V. Babcock', 'Ben Pivar', 'Abraham Grossman', 'Ralph Warrington'] |
| 45533 | NaN | 0.0 | ['Mystery', 'Horror'] | 390959 | en | In this true-crime documentary, we delve into the murder spree that was the inspiration for Joe Berlinger's "Book of Shadows: Blair Witch 2". | 0.076061 | /q75tCM4pFmUzdCg0gqcOQquCaYf.jpg | [] | [] | 2000-10-22 | 0.0 | 45.0 | ['English'] | Released | NaN | Shadow of the Blair Witch | 7.0 | 2.0 | 2000.0 | NaN | ['Tony Abatemarco', 'Andre Brooks', 'Mariclare Costello', 'Bill Dreggors', 'Apollo Dukakis', 'Philip Friedman', 'James Gleason', 'Dilva Henry', 'Bari Hochwald', 'Wendy Hoffman', 'John Huck', 'Rachel Moskowitz', 'Sandy Mulvihill', 'Roger Nolan', 'Chris Parnell', 'Byrne Piven', 'Richard Sexton', 'Rich Williams', 'Ray Xifo'] | ['Ben Rock', 'Ben Rock', 'Jay Bogdanowitsch', 'Pirie Jones', 'Kimberly Rach', 'Ben Rock', 'Sasha Bogdanowitsch', 'Neal Fredericks', 'George Rizkallah', 'Eddie Dunlop', 'David Giella', 'Steven P. Duchscherer', 'Chris Davis', 'Kimberly Eckhout', 'Noelle Polard', 'Noelle Polard', 'Kimberly Eckhout', 'Hillary Wallace', 'Hillary Wallace', 'Craig Borden', 'Renelouise Smith', 'Aaron Walters', 'Shaun Richkind', 'Jeremy M. Gilleece', 'Jeremy M. Gilleece', 'Jackson Hilliard', 'James Grossman', 'Dale Obert', 'Ann Roth'] |
| 45534 | NaN | 0.0 | ['Horror'] | 289923 | en | A film archivist revisits the story of Rustin Parr, a hermit thought to have murdered seven children while under the possession of the Blair Witch. | 0.38645 | /lXtoHVdej6kS1Dc7KAhw05sMos9.jpg | ['Neptune Salad Entertainment', 'Pirie Productions'] | ['United States of America'] | 2000-10-03 | 0.0 | 30.0 | ['English'] | Released | Do you know what happened 50 years before "The Blair Witch Project"? | The Burkittsville 7 | 7.0 | 1.0 | 2000.0 | NaN | ['Monty Bane', 'Lucy Butler', 'David Grammer', 'Bill Dreggors', 'Frank Pastor', 'Heather Donahue', 'Joshua Leonard', 'Michael C. Williams'] | ['Ben Rock', 'Ben Rock'] |
| 45535 | NaN | 0.0 | ['Science Fiction'] | 222848 | en | It's the year 3000 AD. The world's most dangerous women are banished to a remote asteroid 45 million light years from earth. Kira Murphy doesn't belong; wrongfully accused of a crime she did not commit, she's thrown in this interplanetary prison and left to her own defenses. But Kira's a fighter, and soon she finds herself in the middle of a female gang war; where everyone wants a piece of the action... and a piece of her! "Caged Heat 3000" takes the Women-in-Prison genre to a whole new level... and a whole new galaxy! | 0.661558 | /4lF9LH0b0Z1X94xGK9IOzqEW6k1.jpg | ['Concorde-New Horizons'] | ['United States of America'] | 1995-01-01 | 0.0 | 85.0 | ['English'] | Released | NaN | Caged Heat 3000 | 3.5 | 1.0 | 1995.0 | NaN | ['Lisa Boyle', 'Kena Land', 'Zaneta Polard', 'Don Yanan', 'Debra K. Beatty', 'Mark Sikes', 'Robert J. Ferrelli', 'Ellyn Dawn Humphreys', 'Ron Jeremy', 'Ben Ramsey'] | ['Roger Corman', 'Mike Elliott', 'Aaron Osborne', 'Mike Upton', 'Emile Dupont', 'Felix Chamberlain'] |
| 45536 | NaN | 0.0 | ['Drama', 'Action', 'Romance'] | 30840 | en | Yet another version of the classic epic, with enough variation to make it interesting. The story is the same, but some of the characters are quite different from the usual, in particular Uma Thurman's very special maid Marian. The photography is also great, giving the story a somewhat darker tone. | 5.683753 | /fQC46NglNiEMZBv5XHoyLuOWoN5.jpg | ['Westdeutscher Rundfunk (WDR)', 'Working Title Films', '20th Century Fox Television', 'CanWest Global Communications'] | ['Canada', 'Germany', 'United Kingdom', 'United States of America'] | 1991-05-13 | 0.0 | 104.0 | ['English'] | Released | NaN | Robin Hood | 5.7 | 26.0 | 1991.0 | NaN | ['Patrick Bergin', 'Uma Thurman', 'David Morrissey', 'Jürgen Prochnow', 'Jeroen Krabbé'] | ['John Irvin', 'Sam Resnick', 'John McGrath', 'Sam Resnick', 'Sarah Radclyffe', 'Geoffrey Burgon', 'Jason Lehel', 'Peter Tanner', 'Susie Figgis'] |
| 45537 | NaN | 0.0 | ['Drama', 'Family'] | 439050 | fa | Rising and falling between a man and woman. | 0.072051 | /jldsYflnId4tTWPx8es3uzsB1I8.jpg | [] | ['Iran'] | NaN | 0.0 | 90.0 | ['فارسی'] | Released | Rising and falling between a man and woman | Subdue | 4.0 | 1.0 | NaN | NaN | ['Leila Hatami', 'Kourosh Tahami', 'Elham Korda'] | ['Hamid Nematollah', 'Hamid Nematollah', 'Farshad Mohammadi', 'Masoumeh Bayat', 'Mehdi Saadi', 'Babak Ardalan', 'Azadeh Ghavam', 'Sahand Torabi', 'Homayoun Shajarian'] |
| 45538 | NaN | 0.0 | ['Drama'] | 111109 | tl | An artist struggles to finish his work while a storyline about a cult plays in his head. | 0.178241 | /xZkmxsNmYXJbKVsTRLLx3pqGHx7.jpg | ['Sine Olivia'] | ['Philippines'] | 2011-11-17 | 0.0 | 360.0 | [''] | Released | NaN | Century of Birthing | 9.0 | 3.0 | 2011.0 | NaN | ['Angel Aquino', 'Perry Dizon', 'Hazel Orencio', 'Joel Torre', 'Bart Guingona', 'Soliman Cruz ', 'Roeder', 'Angeli Bayani', 'Dante Perez', 'Betty Uy-Regala', 'Modesta'] | ['Lav Diaz', 'Lav Diaz', 'Dante Perez', 'Lav Diaz', 'Lav Diaz', 'Lav Diaz'] |
| 45539 | NaN | 0.0 | ['Action', 'Drama', 'Thriller'] | 67758 | en | When one of her hits goes wrong, a professional assassin ends up with a suitcase full of a million dollars belonging to a mob boss ... | 0.903007 | /d5bX92nDsISNhu3ZT69uHwmfCGw.jpg | ['American World Pictures'] | ['United States of America'] | 2003-08-01 | 0.0 | 90.0 | ['English'] | Released | A deadly game of wits. | Betrayal | 3.8 | 6.0 | 2003.0 | NaN | ['Erika Eleniak', 'Adam Baldwin', 'Julie du Page', 'James Remar', 'Damian Chapa', 'Louis Mandylor', 'Tom Wright', 'Jeremy Lelliott', 'James Quattrochi', 'Jason Widener', 'Joe Sabatino', 'Kiko Ellsworth', 'Don Swayze', 'Peter Dobson', 'Darrell Dubovsky'] | ['Mark L. Lester', 'C. Courtney Joyner', 'Jeffrey Goldenberg', 'Richard McHugh', 'João Fernandes'] |
| 45540 | NaN | 0.0 | [] | 227506 | en | In a small town live two brothers, one a minister and the other one a hunchback painter of the chapel who lives with his wife. One dreadful and stormy night, a stranger knocks at the door asking for shelter. The stranger talks about all the good things of the earthly life the minister is missing because of his puritanical faith. The minister comes to accept the stranger's viewpoint but it is others who will pay the consequences because the minister will discover the human pleasures thanks to, ehem, his sister- in -law… The tormented minister and his cuckolded brother will die in a strange accident in the chapel and later an infant will be born from the minister's adulterous relationship. | 0.003503 | /aorBPO7ak8e8iJKT5OcqYxU3jlK.jpg | ['Yermoliev'] | ['Russia'] | 1917-10-21 | 0.0 | 87.0 | [] | Released | NaN | Satan Triumphant | 0.0 | 0.0 | 1917.0 | NaN | ['Iwan Mosschuchin', 'Nathalie Lissenko', 'Pavel Pavlov', 'Aleksandr Chabrov', 'Vera Orlova'] | ['Yakov Protazanov', 'Joseph N. Ermolieff'] |
| 45541 | NaN | 0.0 | [] | 461257 | en | 50 years after decriminalisation of homosexuality in the UK, director Daisy Asquith mines the jewels of the BFI archive to take us into the relationships, desires, fears and expressions of gay men and women in the 20th century. | 0.163015 | /s5UkZt6NTsrS7ZF0Rh8nzupRlIU.jpg | [] | ['United Kingdom'] | 2017-06-09 | 0.0 | 75.0 | ['English'] | Released | NaN | Queerama | 0.0 | 0.0 | 2017.0 | NaN | [] | ['Daisy Asquith'] |
Most frequently occurring
| belongs_to_collection | budget | genres | id | original_language | overview | poster_path | production_companies | production_countries | release_date | revenue | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | release_year | return | cast | crew | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34 | NaN | 0.0 | ['Thriller', 'Mystery'] | 141971 | fi | Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia. | /8VSZ9coCzxOCW2wE2Qene1H1fKO.jpg | ['Filmiteollisuus Fine'] | ['Finland'] | 2008-12-26 | 0.0 | 108.0 | ['suomi'] | Released | Which one is the first to return - memory or the murderer? | Blackout | 6.7 | 3.0 | 2008.0 | NaN | ['Petteri Summanen', 'Ismo Kallio', 'Eppu Salminen', 'Irina Björklund', 'Hannu-Pekka Björkman', 'Jenni Banerjee', 'Mikko Leppilampi', 'Lena Meriläinen', 'Mari Perankoski', 'Risto Kaskilahti'] | ['JP Siili', 'JP Siili'] | 9 |
| 3 | ['Pokémon Collection'] | 0.0 | ['Adventure', 'Fantasy', 'Animation', 'Science Fiction', 'Family'] | 12600 | ja | All your favorite Pokémon characters are back, and are joined for the first time by the legendary Pokémon Celebi and Suicune, in this latest exciting Pokémon adventure! In order to escape a greedy Pokémon hunter, Celebi must use the last of its energy to travel through time to the present day. Celebi brings along Sammy, a boy who had been trying to protect it. Along with Ash, Pikachu, and the rest of the gang, Sammy and Celebi encounter an enemy far more advanced than the hunter left behind in the past. This new enemy possesses a Pokéball called a “Dark Ball,” which transforms the Pokémon it captures into evil and far stronger creatures. When Celebi is captured, the fate of the entire forest is threatened. Let POKÉMON 4EVER transport you to a world of adventure as Ash, Suicune and the rest take action to save the day! | /bqL0PVHbQ8Jmw3Njcl38kW0CoeM.jpg | [] | ['Japan', 'United States of America'] | 2001-07-06 | 28023563.0 | 75.0 | ['日本語'] | Released | NaN | Pokémon 4Ever: Celebi - Voice of the Forest | 5.7 | 82.0 | 2001.0 | inf | ['Veronica Taylor', 'Rachael Lillis', 'Maddie Blaustein', 'Ikue Ōtani'] | ['Hisao Shirai', 'Kunihiko Yuyama', 'Choji Yoshikawa', 'Norman J. Grossfeld', 'Alfred R. Kahn', 'Takashi Kawaguchi', 'Masakazu Kubo', 'Yukako Matsusako', 'Takemoto Mori', 'Jim Malone', 'Hideki Sonoda', 'Shinji Miyazaki', 'Yumiko Fuse', 'Toshio Henmi', 'Yutaka Henmi', 'Yutaka Ita', 'Yukiko Nojiri'] | 4 |
| 11 | NaN | 0.0 | ['Action', 'Horror', 'Science Fiction'] | 18440 | en | When a comet strikes Earth and kicks up a cloud of toxic dust, hundreds of humans join the ranks of the living dead. But there's bad news for the survivors: The newly minted zombies are hell-bent on eradicating every last person from the planet. For the few human beings who remain, going head to head with the flesh-eating fiends is their only chance for long-term survival. Yet their battle will be dark and cold, with overwhelming odds. | /tWCyKXHuSrQdLAvNeeVJBnhf1Yv.jpg | [] | ['United States of America'] | 2007-01-01 | 0.0 | 89.0 | ['English'] | Released | NaN | Days of Darkness | 5.0 | 5.0 | 2007.0 | NaN | ['Sabrina Gennarino', 'Tom Eplin'] | ['Jake Kennedy', 'Jake Kennedy'] | 4 |
| 12 | NaN | 0.0 | ['Adventure', 'Animation', 'Drama', 'Action', 'Foreign'] | 23305 | en | In feudal India, a warrior (Khan) who renounces his role as the longtime enforcer to a local lord becomes the prey in a murderous hunt through the Himalayan mountains. | /9GlrmbZO7VGyqhaSR1utinRJz3L.jpg | ['Filmfour'] | ['France', 'Germany', 'India', 'United Kingdom'] | 2001-09-23 | 0.0 | 86.0 | ['हिन्दी'] | Released | NaN | The Warrior | 6.3 | 15.0 | 2001.0 | NaN | ['Irrfan Khan', 'Puru Chibber', 'Aino Annuddin', 'Manoj Mishra', 'Nanhe Khan', 'Chander Singh', 'Hemant Maahaor', 'Mandakini Goswami', 'Sunita Sharma', 'Shaukat Baig', 'Gori Shanker', 'Prabhuram', 'Wagaram', 'Ajai Rohilla', 'Noor Mani', 'Sitaram Panchal', 'Chander Prakash Vyas', 'Sanjal', 'Anupam Shyam', 'Amit Kumar', 'Damayanti Marfatia', 'Trilok Singh', 'Pushpa Negi', 'Karuna Sarah Davis', 'Rakesh Mehra', 'Anuradha Advanti', 'Ismail Bashey', 'Madhu'] | ['Asif Kapadia', 'Asif Kapadia', 'Tim Miller'] | 4 |
| 14 | NaN | 0.0 | ['Comedy', 'Drama'] | 11115 | en | As an ex-gambler teaches a hot-shot college kid some things about playing cards, he finds himself pulled into the world series of poker, where his protégé is his toughest competition. | /kHaBqrrozaG7rj6GJg3sUCiM29B.jpg | ['Andertainment Group', 'Crescent City Pictures', 'Tag Entertainment'] | ['United States of America'] | 2008-01-29 | 0.0 | 85.0 | ['English'] | Released | NaN | Deal | 5.2 | 22.0 | 2008.0 | NaN | ['Burt Reynolds', 'Bret Harrison', 'Shannon Elizabeth', 'Maria Mason', 'Jennifer Tilly', 'Gary Grubbs', 'Charles Durning', 'Caroline Mckinley', 'Brandon Ray Olive', 'Jon Eyez', 'J.D. Evermore'] | ['Eric Strand', 'Peter Rafelson', 'Gil Cates Jr.', 'Gil Cates Jr.', 'Marc Weinstock', 'Tom Harting', 'Jonathan Cates', 'Frank Zito', 'Michael Amato', 'Scott Lazar', 'Albert J. Salzer', 'Marc Weinstock'] | 4 |
| 15 | NaN | 0.0 | ['Comedy', 'Drama'] | 265189 | sv | While holidaying in the French Alps, a Swedish family deals with acts of cowardliness as an avalanche breaks out. | /rGMtc9AtZsnWSSL5VnLaGvx1PI6.jpg | ['Motlys', 'Coproduction Office', 'Film i Väst'] | ['Norway', 'Sweden', 'France'] | 2014-08-15 | 1359497.0 | 118.0 | ['Français', 'Norsk', 'svenska', 'English'] | Released | NaN | Force Majeure | 6.8 | 255.0 | 2014.0 | inf | ['Lisa Loven Kongsli', 'Johannes Bah Kuhnke', 'Clara Wettergren', 'Vincent Wettergren', 'Brady Corbet', 'Kristofer Hivju', 'Fanni Metelius', 'Karin Myrenberg', 'Johannes Moustos'] | ['Ruben Östlund', 'Ruben Östlund', 'Philippe Bober', 'Erik Hemmendorff', 'Marie Kjellson', 'Katja Adomeit', 'Marina Perales', 'Yngve Sæther', 'Ola Fløttum', 'Fredrik Wenzel', 'Jacob Secher Schulsinger', 'Katja Wik', 'Josefin Åsberg', 'Josefin Åsberg', 'Pia Aleborg'] | 4 |
| 16 | NaN | 0.0 | ['Comedy'] | 97995 | en | After breaking a mirror in his home, superstitious Max tries to avoid situations which could bring bad luck but in doing so, causes himself the worst luck imaginable. | /4J6Ai4C5YRgfRUTlirrJ7QsmJKU.jpg | ['Max Linder Productions'] | ['United States of America'] | 1921-02-06 | 0.0 | 62.0 | ['English'] | Released | NaN | Seven Years Bad Luck | 5.6 | 4.0 | 1921.0 | NaN | ['Max Linder', 'Alta Allen', 'Ralph McCullough', 'Betty K. Peterson', 'F.B. Crayne', 'Chance Ward', 'Hugh Saxon', 'Thelma Percy', 'C.E. Anderson', 'Lola Gonzales', 'Harry Mann', 'Joe Martin'] | ['Charles Van Enger', 'Max Linder', 'Max Linder', 'Max Linder'] | 4 |
| 17 | NaN | 0.0 | ['Crime', 'Drama', 'Thriller'] | 5511 | fr | Hitman Jef Costello is a perfectionist who always carefully plans his murders and who never gets caught. | /cvNW8IXigbaMNo4gKEIps0NGnhA.jpg | ['Fida cinematografica', 'Compagnie Industrielle et Commerciale Cinématographique (CICC)', 'TC Productions', 'Filmel'] | ['France', 'Italy'] | 1967-10-25 | 39481.0 | 105.0 | ['Français'] | Released | There is no solitude greater than that of the Samurai | Le Samouraï | 7.9 | 187.0 | 1967.0 | inf | ['Alain Delon', 'François Périer', 'Nathalie Delon', 'Cathy Rosier', 'Catherine Jourdan', 'Jacques Leroy', 'Michel Boisrond', 'Robert Favart', 'Jean-Pierre Posier', 'Roger Fradet', 'Carlo Nell', 'Robert Rondo', 'André Salgues', 'André Thorent', 'Jacques Deschamps', 'Georges Casati', 'Jacques Léonard', 'Pierre Vaudier', 'Maurice Magalon', 'Gaston Meunier', 'Jean Gold', 'Georges Billy', 'Ari Aricardi', 'Guy Bonnafoux', 'Humberto Catalano', 'Carl Lechner', 'Maria Maneva'] | ['Henri Decaë', 'Raymond Borderie', 'Jean-Pierre Melville', 'Jean-Pierre Melville', 'Jean-Pierre Melville', 'François de Roubaix', 'Georges Pellegrin', 'Georges Pellegrin', 'Eugène Lépicier', 'Monique Bonnot', 'Yolande Maurette', 'Joan McLeod'] | 4 |
| 19 | NaN | 0.0 | ['Documentary'] | 84198 | en | Using personal stories, this powerful documentary illuminates the plight of the 49 million Americans struggling with food insecurity. A single mother, a small-town policeman and a farmer are among those for whom putting food on the table is a daily battle. | /jn8L1QdWWX5c0NUOLjzaSXtZrbt.jpg | [] | ['United States of America'] | 2012-03-22 | 0.0 | 84.0 | ['English'] | Released | One Nation. Underfed. | A Place at the Table | 6.9 | 7.0 | 2012.0 | NaN | ['Jeff Bridges', 'Tom Colicchio', 'Mariana Chilton', 'Ken Cook', 'Barbie Izquierdo', 'James McGovern', 'Marion Nestle', 'Raj Patel', 'Janet Poppendieck'] | ['Kristi Jacobson', 'Lori Silverbush'] | 4 |
| 21 | NaN | 0.0 | ['Drama', 'Comedy'] | 168538 | en | In Zola's Paris, an ingenue arrives at a tony bordello: she's Nana, guileless, but quickly learning to use her erotic innocence to get what she wants. She's an actress for a soft-core filmmaker and soon is the most popular courtesan in Paris, parlaying this into a house, bought for her by a wealthy banker. She tosses him and takes up with her neighbor, a count of impeccable rectitude, and with the count's impressionable son. The count is soon fetching sticks like a dog and mortgaging his lands to satisfy her whims. | /pg4PUHRFrgNfACHSh5MITQ2gYch.jpg | ['Cannon Group', 'Metro-Goldwyn-Mayer (MGM)'] | [] | 1983-06-13 | 0.0 | 92.0 | [] | Released | NaN | Nana, the True Key of Pleasure | 4.7 | 3.0 | 1983.0 | NaN | ['Katya Berger', 'Jean-Pierre Aumont', 'Yehuda Efroni', 'Yehuda Efroni', 'Massimo Serato', 'Debra Berger', 'Shirin Taylor', 'Annie Belle', 'Paul Müller', 'Marcus Beresford', 'Robert Bridges', 'Tom Felleghy'] | ['Marc Behm', 'Émile Zola', 'Dan Wolman'] | 4 |